An independent and comprehensive evaluation of the CAZyme classifiers: - dbCAN2 - dbCAN3 - dbCAN4 - eCAMI - CUPP - HMMER (dbCAN4) - dbCAN-sub (dbCAN4) - DIAMOND (dbCAN4)

Evaluating performance of: - Binary CAZyme/non-CAZyme classification - Test set dependent performance of CAZyme/non-CAZyme classification - Binary classification per CAZy class - Multilabel CAZy class classification - Binary classification per CAZy family - Multilabel CAZy family classification

1 CAZyme classifier references and names

The CAZyme classifier dbCAN is available as a webserver and a standalone tool. In this evaluation the standalone tool was used, and is referred to as dbCAN, references to the webserver are defined as the dbCAN webserver. The version numbers of the standalone tool and the webserver are independent of one another: * The dbCAN2 webserver initally ran dbCAN version 2 (referred to as dbCAN2) * The dbCAN2 webserver than implememented the standalone dbCAN version 3 (referred to as dbCAN3) * The dbCAN3 webserver implements the standalone dbCAN version 4 (referred to as dbCAN4)

Each version of dbCAN implements multiple sequence alignment and modelling tools: * dbCAN2: * DIAMOND * HMMER * Hotpep * dbCAN3: * DIAMOND * HMMER * eCAMI * dbCAN4: * DIAMOND * HMMER * dbCAN-sub (implementation of HMMER)

All references to implementing HMMER and DIAMOND refer to the implementation of these tools by dbCAN. For this evaluation, specifically the implementation of HMMER and DIAMOND by dbCAN4.

dbCAN2 and dbCAN3

Han Zhang and others, dbCAN2: a meta server for automated carbohydrate-active enzyme annotation, Nucleic Acids Research, Volume 46, Issue W1, 2 July 2018, Pages W95–W101

dbCAN4 and dbCAN-sub (dbCAN)

Zheng J, Ge Q, Yan Y, Zhang X, Huang L, Yin Y. dbCAN3: automated carbohydrate-active enzyme and substrate annotation. Nucleic Acids Res. 2023 Jul 5;51(W1):W115-W121

eCAMI

Xu J, Zhang H, Zheng J, Dovoedo P, Yin Y. eCAMI: simultaneous classification and motif identification for enzyme annotation. Bioinformatics. 2020 Apr 1;36(7):2068-2075

CUPP

Barrett, K., Lange, L. Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP). Biotechnol Biofuels 12, 102 (2019). https://doi.org/10.1186/s13068-019-1436-5

HMMER

Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755-63.

DIAMOND

Buchfink, B., Xie, C. & Huson, D. Fast and sensitive protein alignment using DIAMOND. Nat Methods 12, 59–60 (2015). https://doi.org/10.1038/nmeth.3176

2 Introduction

CAZyme classifiers analyse query protein sequence and predict CAZyme domains and associated CAZy family annotations. This enables exploratory analysis of CAZyme complements not presently catalogued in the CAZy database (www.cazy.org). Each CAZyme classifier implements a different method to predict CAZy family annotations.

We previously published an evaluation of dbCAN2, eCAMI and CUPP (Hobbs et al., 2021). Since then, two standalone versions of dbCAN have been released (dbCAN3 and dbCAN4). Additionally, the previous analysis was limited to 70 genomes, over weighted towards bacterial genomes. To address these points, we present here an independent and comprehensive evaluation of the CAZyme classifiers: * dbCAN2 (v2.0.11) * dbCAN3 (v3.0.7) * dbCAN4 (v4.0.0) * eCAMI (implemented by dbCAN_3 (3.0.7)) * CUPP (v???) * HMMER (from dbCAN4) * dbCAN-sub (from dbCAN4) * DIAMOND (from dbCAN4)

Evaluating performance of: * Binary CAZyme/non-CAZyme classification * Test set dependent performance of CAZyme/non-CAZyme classification * Binary classification per CAZy class * Multilabel CAZy class classification * Binary classification per CAZy family * Multilabel CAZy family classification

Hobbs, Emma E. M.; Gloster, Tracey M.; Chapman, Sean; Pritchard, Leighton (2021). Microbiology Society Annual Conference 2021. figshare. Poster. https://doi.org/10.6084/m9.figshare.14370836.v3

3 Test sets

A single test set of 100 CAZymes and 100 non-CAZymes with the highest sequence similarity (rated by bit-score ratio) was created per genomic assembly selected to be included in the benchmark test set.

Choosing the 100 non-CAZymes with the highest sequence similarity was devised to increase the probability of causing confusion, to gather a better idea of the expected performance when using the classifiers. An equal number of CAZymes to non-CAZymes was selected to prevent over representation of one population over the other.

For inclusion of a genomic assembly for the creation of a test set, the assembly had to meet of all the following criteria:

  • Contains at least 100 CAZymes
  • Contains at least 100 non-CAZymes
  • Has an ‘Assembly level’ of ‘Complete Genome’ in the NCBI Assembly database
  • Protein records are still present in NCBI
  • Not listed as an ‘Anomalous assembly’ in the NCBI Assembly database

The genomic assemblies were also chosen from a range of taxonomies to provide as informative image of the performance of the classifiers over a range of datasets that users may wish to analyse.

We took the 70 test sets used in the previous evaluation (Hobbs et al., 2021), and added an additional 10 genomes.

## [1] "Mean percentage of genome incorporated in the CAZome across all test sets:"
## [1] 3.22
## [1] "Standard deviation of the percentage of genome incorporated in the CAZome across all test sets:"
## [1] 1.17
## [1] "Mean percentage of CAZomes incorporated in the test set across all genomes:"
## [1] 58.6
## [1] "Standard deviation of the percentage of CAZome incorporated in the test set across all genomes:"
## [1] 27.55
Histogram of CAZome coverage of the test sets for each respective source genomic assembly, overlayed by a box and whisker plot of the percentage of the CAZome incorproated in the test set.

Figure 3.1: Histogram of CAZome coverage of the test sets for each respective source genomic assembly, overlayed by a box and whisker plot of the percentage of the CAZome incorproated in the test set.

4 CAZyme/non-CAZyme classification

The assignment of CAZy family annotations by a CAZyme classifier identifies the protein as a CAZyme. If no CAZy family annotations are assigned to a protein by a CAZyme classifier, the tool identified the protein as a non-CAZyme. Here we evaluate the performance of each CAZyme classifier to differentiate between CAZymes and non-CAZymes (defined as proteins catalogued and not catalogued in CAZy respectively).

4.1 Summary statistics

For every classifier-test set pair, the specificity, sensitivity, prevision, F1-score and accuracy was calculated. The mean of each statistical parameter was calculated for each classifier across all tests, to represent the overall performance of each classifier. The 95% confidence interval (CI) was also calculated owing the tendancy of the mean to skew towards 1. These results are presented in table 4.1.

Table 4.1: Overall performance of CAZyme classifiers differentiation between CAZymes and non-CAZymes
Classifier Spec Mean Spec Standard Deviation Spec Lower CI Spec Upper CI Sens Mean Sens Standard Deviation Sens Lower CI Sens Upper CI Prec Mean Prec Standard Deviation Prec Lower CI Prec Upper CI F1-score Mean F1-score Standard Deviation F1-score Lower CI F1-score Upper CI Acc Mean Acc Standard Deviation Acc Lower CI Acc Upper CI
CUPP 0.9818 0.0487 0.9709 0.9926 0.8539 0.0744 0.8373 0.8704 0.9812 0.0454 0.9711 0.9913 0.9109 0.0520 0.8994 0.9225 0.9178 0.0460 0.9076 0.9280
dbCAN_2 0.9772 0.0526 0.9655 0.9890 0.9012 0.1082 0.8772 0.9253 0.9779 0.0453 0.9679 0.9880 0.9332 0.0789 0.9157 0.9508 0.9392 0.0596 0.9260 0.9525
dbCAN_2:DIAMOND 0.9744 0.0542 0.9623 0.9864 0.9158 0.1287 0.8871 0.9444 0.9758 0.0448 0.9659 0.9858 0.9380 0.0923 0.9175 0.9586 0.9451 0.0673 0.9301 0.9600
dbCAN_2:HMMER 0.9786 0.0517 0.9671 0.9901 0.8779 0.0807 0.8599 0.8958 0.9788 0.0446 0.9689 0.9887 0.9225 0.0639 0.9083 0.9367 0.9282 0.0473 0.9177 0.9388
dbCAN_2:Hotpep 0.9759 0.0459 0.9657 0.9861 0.8084 0.1268 0.7801 0.8366 0.9717 0.0535 0.9598 0.9836 0.8771 0.0916 0.8567 0.8975 0.8921 0.0714 0.8762 0.9080
dbCAN_3 0.9592 0.0768 0.9422 0.9763 0.9871 0.0436 0.9774 0.9968 0.9644 0.0556 0.9521 0.9768 0.9742 0.0410 0.9651 0.9833 0.9732 0.0436 0.9635 0.9829
dbCAN_3:DIAMOND 0.9665 0.0736 0.9501 0.9829 0.9689 0.0823 0.9506 0.9872 0.9706 0.0543 0.9586 0.9827 0.9664 0.0660 0.9517 0.9811 0.9677 0.0542 0.9556 0.9798
dbCAN_3:eCAMI 0.9752 0.0475 0.9647 0.9858 0.8516 0.1278 0.8232 0.8801 0.9738 0.0463 0.9635 0.9841 0.9026 0.0870 0.8832 0.9220 0.9134 0.0682 0.8983 0.9286
dbCAN_3:HMMER 0.9788 0.0520 0.9672 0.9903 0.8880 0.0818 0.8698 0.9062 0.9791 0.0448 0.9692 0.9891 0.9282 0.0646 0.9139 0.9426 0.9334 0.0481 0.9227 0.9441
dbCAN_4 0.9622 0.0763 0.9453 0.9792 0.9841 0.0640 0.9699 0.9984 0.9672 0.0555 0.9549 0.9796 0.9733 0.0531 0.9614 0.9851 0.9732 0.0491 0.9623 0.9841
dbCAN_4:dbCAN-sub 0.9775 0.0529 0.9657 0.9893 0.9498 0.0729 0.9335 0.9660 0.9792 0.0432 0.9696 0.9888 0.9618 0.0563 0.9493 0.9744 0.9636 0.0449 0.9536 0.9736
dbCAN_4:DIAMOND 0.9668 0.0744 0.9502 0.9833 0.9694 0.0835 0.9508 0.9880 0.9710 0.0545 0.9589 0.9832 0.9666 0.0694 0.9511 0.9820 0.9681 0.0547 0.9559 0.9802
dbCAN_4:HMMER 0.9788 0.0520 0.9672 0.9903 0.8900 0.0819 0.8718 0.9082 0.9792 0.0447 0.9692 0.9891 0.9293 0.0648 0.9149 0.9438 0.9344 0.0483 0.9236 0.9451

The 95% CI was plotted as error bars around the mean CI (figure 4.1.

Summary statistics of CAZyme classifiers performances of binary CAZyme/non-CAZyme prediction. The mean plus and minus the 95% confidence interval.

Figure 4.1: Summary statistics of CAZyme classifiers performances of binary CAZyme/non-CAZyme prediction. The mean plus and minus the 95% confidence interval.

4.2 Specificity

Specificity is the proportion of known negatives (known non-CAZymes) which are correctly classified as negatives (non-CAZymes). Figure 4.2 is a graphical representation of the results calculated in table 4.1.

One-dimensional scatter plot of specificity scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.

Figure 4.2: One-dimensional scatter plot of specificity scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.

4.3 Sensitivity

Sensitivity (also known as recall) is the proportion of known positives (CAZymes) that are correctly identified as positives (CAZymes). Figure 4.3 graphically represents of the results calculated in table 4.1.

One-dimensional scatter plot of recall (sensitivity) scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.

Figure 4.3: One-dimensional scatter plot of recall (sensitivity) scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.

4.4 Precision

Precision is the proportion of positive predictions by the classifiers that are correct. In this case, precision represents the fraction of CAZyme predictions by the classifiers that are correct, specifically the proportion of predicted CAZymes that are known CAZymes. Figure 4.4 is a visual representation of the results calculated in table 4.1.

One-dimensional scatter plot of precision scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.

Figure 4.4: One-dimensional scatter plot of precision scores of CAZyme and non-CAZyme predictions per test set, overlaying box plot of standard deviation.

4.5 F1-score

The F1-score is a harmonic (or weighted) average of recall and precision and provides an idea of the overall performance of the tool, 0 being the lowest and 1 being the best performance. Figure 4.5 shows the F1-score from each test set, for each classifier.

Bar chart of specificity of CAZyme classifiers differentiation between CAZymes and non-CAZymes.

Figure 4.5: Bar chart of specificity of CAZyme classifiers differentiation between CAZymes and non-CAZymes.

4.6 Accuracy

Accuarcy (calculated using (TP + TN) / (TP + TN + FP + FN) ) provides an idea of the overall performance of the classifiers as a measure of the degree to which their CAZyme/non-CAZyme predictions conforms to the correct result. Figure 4.6 is a plot of respective data from table 4.1.

Bar chart of specificity of CAZyme classifiers differentiation between CAZymes and non-CAZymes.

Figure 4.6: Bar chart of specificity of CAZyme classifiers differentiation between CAZymes and non-CAZymes.

4.7 Combined statistics plot

Here we generate a plot that combines the plots from above into a single figure.

Box and whisker plots of the performance of CAZyme/non-CAZyme classification.

Figure 4.7: Box and whisker plots of the performance of CAZyme/non-CAZyme classification.

4.8 ROC curve - Receiver Operator Characteristic curve

The Receiver Operator Characteristic (ROC) curve (in figure 4.8) enables us to compare sensitivity to specificity but plotting sensitivity versus 1-specificity.

Receiver Operator Characteristic (ROC) curve of CAZyme/non-CAZyme classification.

Figure 4.8: Receiver Operator Characteristic (ROC) curve of CAZyme/non-CAZyme classification.

4.9 Expected Range of Accuracy

The statistics evaluated above provide an idea of the general performance of the tools, but they do not provide an idea of the expect range of performance. Specifically, the data does not provide a clear image of the best and worse performance a user can expect when using these tools.

To compare the expected typical range in accuracies for each classifier, 6 test sets (identified by the source genomic assemblies) were selected at random. The CAZyme/non-CAZyme predictions for each classifier, for each test set, were bootstrap resampled 100 times each, and for each bootstrap sample the accuracy calculated. The accuracies of the bootstrap samples for each classifier were plotted on stacked histograms, shown in figure 4.9.

Stacked histograms of bootstrap sample accuracies of CAZyme classifiers' differentiation between CAZymes and non-CAZymes. 6 test sets (identified by their source genomic assembly) were selected at random. The CAZyme/non-CAZyme predictions for each classifier, for each test set, were bootstrap resampled 100 times. The accuracy of each of the 600 bootstrap samples per test set were plotted as a stacked histogram.

Figure 4.9: Stacked histograms of bootstrap sample accuracies of CAZyme classifiers’ differentiation between CAZymes and non-CAZymes. 6 test sets (identified by their source genomic assembly) were selected at random. The CAZyme/non-CAZyme predictions for each classifier, for each test set, were bootstrap resampled 100 times. The accuracy of each of the 600 bootstrap samples per test set were plotted as a stacked histogram.

4.10 Conclusions on the Binary CAZyme/non-CAZyme Prediction Performance

Overall, all tools showed a low probability of producing false positives (missclassifying a non-CAZyme as a CAZyme), and few of the positive predictions are false positives. Therefore, we can be confident in that the CAZyme predictions made by each of these tools are most likely correct. However, all the classifiers demonstrated a consistent behaviour to not identify all CAZymes within a CAZome. Therefore, we can be confident in the CAZyme predictions, but should not presume all non-CAZyme predictions are correct; these classifiers are unlikely to identify the complete CAZome although a near-complete CAZome will be accurately identified.

5 CAZyme/non-CAZyme classification: Taxonomic evaluation

The performance for a classifier per taxonomy group may vary. For this evaluation the test sets were separated into the taxonomy groups: - Bacteria - Eukaryote

The evaluation per classifier per taxonomy group, versus all test sets pooled together was evaluated.

5.1 Specificity

Table 5.1: The specificity of binary CAZyme/non-CAZyme classification by CAZy classifiers per taxonomy group
Prediction_tool Bact Mean Bact Standard Deviation Bact Lower CI Bact Upper CI Euk Mean Euk Standard Deviation Euk Lower CI Euk Upper CI All Mean All Standard Deviation All Lower CI All Upper CI
CUPP 0.9888 0.0276 0.9799 0.9976 0.9748 0.0628 0.9547 0.9948 0.9818 0.0486 0.9742 0.9893
dbCAN_2 0.9828 0.0349 0.9716 0.9939 0.9718 0.0658 0.9507 0.9928 0.9772 0.0525 0.9691 0.9854
dbCAN_2:DIAMOND 0.9790 0.0352 0.9677 0.9903 0.9698 0.0684 0.9479 0.9916 0.9744 0.0541 0.9659 0.9828
dbCAN_2:HMMER 0.9858 0.0286 0.9766 0.9949 0.9715 0.0670 0.9501 0.9929 0.9786 0.0515 0.9706 0.9867
dbCAN_2:Hotpep 0.9788 0.0344 0.9677 0.9898 0.9730 0.0553 0.9553 0.9907 0.9759 0.0457 0.9687 0.9830
dbCAN_3 0.9685 0.0402 0.9557 0.9813 0.9500 0.1007 0.9178 0.9822 0.9592 0.0765 0.9473 0.9712
dbCAN_3:DIAMOND 0.9785 0.0363 0.9669 0.9901 0.9545 0.0967 0.9236 0.9854 0.9665 0.0734 0.9550 0.9780
dbCAN_3:HMMER 0.9868 0.0284 0.9777 0.9958 0.9708 0.0673 0.9492 0.9923 0.9788 0.0518 0.9707 0.9868
dbCAN_3:eCAMI 0.9772 0.0351 0.9660 0.9885 0.9732 0.0578 0.9548 0.9917 0.9752 0.0474 0.9678 0.9827
dbCAN_4 0.9740 0.0383 0.9618 0.9862 0.9505 0.1003 0.9184 0.9826 0.9622 0.0761 0.9504 0.9741
dbCAN_4:DIAMOND 0.9798 0.0355 0.9684 0.9911 0.9538 0.0980 0.9224 0.9851 0.9668 0.0742 0.9552 0.9783
dbCAN_4:HMMER 0.9868 0.0284 0.9777 0.9958 0.9708 0.0673 0.9492 0.9923 0.9788 0.0518 0.9707 0.9868
dbCAN_4:dbCAN-sub 0.9868 0.0270 0.9781 0.9954 0.9682 0.0690 0.9462 0.9903 0.9775 0.0527 0.9693 0.9857

5.2 Sensitivity

Table 5.2: The sensitivity of binary CAZyme/non-CAZyme classification by CAZy classifiers per taxonomy group
Prediction_tool Bact Mean Bact Standard Deviation Bact Lower CI Bact Upper CI Euk Mean Euk Standard Deviation Euk Lower CI Euk Upper CI All Mean All Standard Deviation All Lower CI All Upper CI
CUPP 0.8618 0.0788 0.8366 0.8869 0.8460 0.0699 0.8236 0.8684 0.8539 0.0742 0.8423 0.8655
dbCAN_2 0.9052 0.1144 0.8686 0.9419 0.8972 0.1029 0.8643 0.9302 0.9012 0.1079 0.8844 0.9181
dbCAN_2:DIAMOND 0.9182 0.1346 0.8752 0.9613 0.9132 0.1241 0.8736 0.9529 0.9158 0.1283 0.8957 0.9358
dbCAN_2:HMMER 0.8695 0.1078 0.8350 0.9040 0.8862 0.0379 0.8741 0.8984 0.8779 0.0804 0.8653 0.8904
dbCAN_2:Hotpep 0.8195 0.1164 0.7823 0.8567 0.7972 0.1371 0.7534 0.8411 0.8084 0.1264 0.7886 0.8281
dbCAN_3 0.9832 0.0605 0.9639 1.0026 0.9910 0.0130 0.9869 0.9951 0.9871 0.0435 0.9803 0.9939
dbCAN_3:DIAMOND 0.9560 0.1088 0.9212 0.9908 0.9818 0.0392 0.9692 0.9943 0.9689 0.0820 0.9561 0.9817
dbCAN_3:HMMER 0.8728 0.1088 0.8380 0.9075 0.9032 0.0353 0.8920 0.9145 0.8880 0.0815 0.8753 0.9007
dbCAN_3:eCAMI 0.8832 0.1258 0.8430 0.9235 0.8200 0.1232 0.7806 0.8594 0.8516 0.1273 0.8317 0.8715
dbCAN_4 0.9748 0.0897 0.9461 1.0034 0.9935 0.0083 0.9908 0.9962 0.9841 0.0638 0.9742 0.9941
dbCAN_4:DIAMOND 0.9498 0.1149 0.9130 0.9865 0.9890 0.0110 0.9855 0.9925 0.9694 0.0832 0.9564 0.9824
dbCAN_4:HMMER 0.8765 0.1097 0.8414 0.9116 0.9035 0.0342 0.8926 0.9144 0.8900 0.0816 0.8773 0.9027
dbCAN_4:dbCAN-sub 0.9520 0.1011 0.9197 0.9843 0.9475 0.0228 0.9402 0.9548 0.9498 0.0726 0.9384 0.9611

5.3 Precision

Table 5.3: The precision of binary CAZyme/non-CAZyme classification by CAZy classifiers per taxonomy group
Prediction_tool Bact Mean Bact Standard Deviation Bact Lower CI Bact Upper CI Euk Mean Euk Standard Deviation Euk Lower CI Euk Upper CI All Mean All Standard Deviation All Lower CI All Upper CI
CUPP 0.9877 0.0280 0.9787 0.9966 0.9748 0.0574 0.9564 0.9932 0.9812 0.0452 0.9742 0.9883
dbCAN_2 0.9824 0.0348 0.9713 0.9936 0.9734 0.0540 0.9562 0.9907 0.9779 0.0452 0.9709 0.9850
dbCAN_2:DIAMOND 0.9791 0.0348 0.9680 0.9902 0.9726 0.0533 0.9555 0.9896 0.9758 0.0447 0.9689 0.9828
dbCAN_2:HMMER 0.9847 0.0299 0.9751 0.9943 0.9729 0.0553 0.9552 0.9906 0.9788 0.0445 0.9719 0.9857
dbCAN_2:Hotpep 0.9759 0.0381 0.9637 0.9881 0.9676 0.0656 0.9466 0.9885 0.9717 0.0533 0.9634 0.9801
dbCAN_3 0.9704 0.0357 0.9590 0.9818 0.9585 0.0702 0.9361 0.9810 0.9644 0.0555 0.9558 0.9731
dbCAN_3:DIAMOND 0.9795 0.0332 0.9689 0.9901 0.9618 0.0686 0.9399 0.9837 0.9706 0.0541 0.9622 0.9791
dbCAN_3:HMMER 0.9858 0.0297 0.9763 0.9953 0.9725 0.0556 0.9547 0.9903 0.9791 0.0446 0.9722 0.9861
dbCAN_3:eCAMI 0.9764 0.0349 0.9653 0.9876 0.9713 0.0557 0.9534 0.9891 0.9738 0.0461 0.9666 0.9810
dbCAN_4 0.9755 0.0346 0.9645 0.9866 0.9589 0.0700 0.9365 0.9813 0.9672 0.0553 0.9586 0.9759
dbCAN_4:DIAMOND 0.9805 0.0327 0.9701 0.9910 0.9615 0.0690 0.9394 0.9836 0.9710 0.0544 0.9625 0.9795
dbCAN_4:HMMER 0.9858 0.0297 0.9763 0.9953 0.9725 0.0556 0.9548 0.9903 0.9792 0.0446 0.9722 0.9861
dbCAN_4:dbCAN-sub 0.9869 0.0258 0.9786 0.9951 0.9715 0.0547 0.9540 0.9889 0.9792 0.0431 0.9724 0.9859

5.4 F1-score

Table 5.4: The F1-score of binary CAZyme/non-CAZyme classification by CAZy classifiers per taxonomy group
Prediction_tool Bact Mean Bact Standard Deviation Bact Lower CI Bact Upper CI Euk Mean Euk Standard Deviation Euk Lower CI Euk Upper CI All Mean All Standard Deviation All Lower CI All Upper CI
CUPP 0.9184 0.0518 0.9018 0.9349 0.9035 0.0519 0.8869 0.9201 0.9109 0.0519 0.9028 0.9190
dbCAN_2 0.9373 0.0837 0.9105 0.9640 0.9292 0.0746 0.9053 0.9530 0.9332 0.0786 0.9209 0.9455
dbCAN_2:DIAMOND 0.9406 0.0985 0.9091 0.9722 0.9354 0.0868 0.9076 0.9632 0.9380 0.0920 0.9237 0.9524
dbCAN_2:HMMER 0.9188 0.0839 0.8919 0.9456 0.9263 0.0346 0.9152 0.9373 0.9225 0.0637 0.9126 0.9324
dbCAN_2:Hotpep 0.8861 0.0757 0.8619 0.9103 0.8680 0.1053 0.8344 0.9017 0.8771 0.0913 0.8628 0.8913
dbCAN_3 0.9753 0.0406 0.9623 0.9883 0.9730 0.0418 0.9596 0.9864 0.9742 0.0408 0.9678 0.9805
dbCAN_3:DIAMOND 0.9628 0.0818 0.9366 0.9889 0.9700 0.0459 0.9553 0.9847 0.9664 0.0658 0.9561 0.9767
dbCAN_3:HMMER 0.9210 0.0845 0.8940 0.9481 0.9354 0.0347 0.9243 0.9465 0.9282 0.0644 0.9182 0.9383
dbCAN_3:eCAMI 0.9219 0.0802 0.8962 0.9476 0.8833 0.0903 0.8544 0.9121 0.9026 0.0868 0.8890 0.9161
dbCAN_4 0.9720 0.0627 0.9519 0.9920 0.9745 0.0421 0.9611 0.9880 0.9733 0.0529 0.9650 0.9815
dbCAN_4:DIAMOND 0.9594 0.0890 0.9310 0.9879 0.9737 0.0415 0.9604 0.9870 0.9666 0.0691 0.9558 0.9774
dbCAN_4:HMMER 0.9231 0.0851 0.8959 0.9503 0.9356 0.0343 0.9246 0.9465 0.9293 0.0646 0.9193 0.9394
dbCAN_4:dbCAN-sub 0.9654 0.0738 0.9418 0.9889 0.9583 0.0308 0.9484 0.9682 0.9618 0.0561 0.9531 0.9706

5.5 Accuracy

Table 5.5: The accuracy of binary CAZyme/non-CAZyme classification by CAZy classifiers per taxonomy group
Prediction_tool Bact Mean Bact Standard Deviation Bact Lower CI Bact Upper CI Euk Mean Euk Standard Deviation Euk Lower CI Euk Upper CI All Mean All Standard Deviation All Lower CI All Upper CI
CUPP 0.9252 0.0430 0.9115 0.9390 0.9104 0.0481 0.8950 0.9258 0.9178 0.0458 0.9107 0.9250
dbCAN_2 0.9440 0.0600 0.9248 0.9632 0.9345 0.0594 0.9155 0.9535 0.9392 0.0594 0.9300 0.9485
dbCAN_2:DIAMOND 0.9486 0.0687 0.9267 0.9706 0.9415 0.0666 0.9202 0.9628 0.9451 0.0671 0.9346 0.9555
dbCAN_2:HMMER 0.9276 0.0561 0.9097 0.9456 0.9289 0.0372 0.9170 0.9408 0.9282 0.0472 0.9209 0.9356
dbCAN_2:Hotpep 0.8991 0.0603 0.8799 0.9184 0.8851 0.0813 0.8591 0.9111 0.8921 0.0712 0.8810 0.9032
dbCAN_3 0.9759 0.0354 0.9645 0.9872 0.9705 0.0509 0.9542 0.9868 0.9732 0.0435 0.9664 0.9800
dbCAN_3:DIAMOND 0.9672 0.0561 0.9493 0.9852 0.9681 0.0531 0.9512 0.9851 0.9677 0.0541 0.9592 0.9761
dbCAN_3:HMMER 0.9298 0.0568 0.9116 0.9479 0.9370 0.0378 0.9249 0.9491 0.9334 0.0479 0.9259 0.9409
dbCAN_3:eCAMI 0.9302 0.0635 0.9099 0.9506 0.8966 0.0693 0.8745 0.9188 0.9134 0.0680 0.9028 0.9241
dbCAN_4 0.9744 0.0474 0.9592 0.9895 0.9720 0.0513 0.9556 0.9884 0.9732 0.0489 0.9655 0.9808
dbCAN_4:DIAMOND 0.9648 0.0591 0.9459 0.9836 0.9714 0.0504 0.9552 0.9875 0.9681 0.0545 0.9596 0.9766
dbCAN_4:HMMER 0.9316 0.0574 0.9133 0.9500 0.9371 0.0375 0.9251 0.9491 0.9344 0.0481 0.9269 0.9419
dbCAN_4:dbCAN-sub 0.9694 0.0528 0.9525 0.9863 0.9579 0.0351 0.9466 0.9691 0.9636 0.0448 0.9566 0.9706

6 CAZy class classification

CAZy groups CAZymes into CAZy families by sequence similarity, and CAZy families are grouped into one of 6 functional classes. The CAZyme classifiers predict the CAZy family annotations of predicted CAZymes, but it is of interest to see what the level of performance of the classiferis is at the CAZy class level. Specifically, a classifier may struggle to predict the correct CAZy class for a CAZyme but consistently predict the correct CAZy class. Therefore, the aim of this part of the evaluation is to evaluate the performance of the classifiers to predict the correct CAZy class of predict CAZymes.

6.2 Performance per CAZy class

Below the prediction sensitivity is plotted against the specificity for each classifier, and a separate plot is generated for each CAZy class.

The scatter plots of sensitivity against specificity overlay a coloured contour to highlight the distribution of the points. When too many points have the same value a contour cannot be generated. In order to plot a contour noise is added to the data. The original data is used to plot the scatter plot and the data with added noise is used to plot the contour.

The percentage of the data points which need noise to be added to them in order to generate a contour varies from data set to data set. To change the percentage of the data points with noise added to them, change the third value of call to the function plot.class.sens.vs.spec(), which is used to generate the plots. The third value is the percentage of data points to add noise to, written in decimal form.

6.2.1 GH class classification

Table 6.3: Overall performance of CAZyme classifiers classification of GH class members
Prediction_tool Spec Mean Spec Standard Deviation Spec CI Lower Spec CI Upper Sens Mean Sens Standard Deviation Sens CI Lower Sens CI Upper Prec Mean Prec Standard Deviation Prec CI Lower Prec CI Upper F1-score Mean F1-score Standard Deviation F1-score CI Lower F1-score CI Upper Acc Mean Acc Standard Deviation Acc CI Lower Acc CI Upper
CUPP 0.9933 0.0212 0.9886 0.9980 0.9080 0.0675 0.8930 0.9230 0.9906 0.0338 0.9831 0.9981 0.9461 0.0454 0.9360 0.9562 0.9540 0.0293 0.9475 0.9605
dbCAN_2 0.9917 0.0245 0.9862 0.9972 0.9354 0.0907 0.9152 0.9556 0.9886 0.0347 0.9809 0.9963 0.9583 0.0614 0.9446 0.9719 0.9658 0.0361 0.9578 0.9738
dbCAN_2:DIAMOND 0.9862 0.0276 0.9801 0.9924 0.9379 0.1064 0.9142 0.9615 0.9845 0.0360 0.9765 0.9926 0.9563 0.0727 0.9401 0.9725 0.9644 0.0415 0.9551 0.9736
dbCAN_2:HMMER 0.9927 0.0242 0.9873 0.9981 0.9080 0.0824 0.8896 0.9263 0.9897 0.0339 0.9821 0.9972 0.9444 0.0587 0.9313 0.9574 0.9536 0.0330 0.9463 0.9609
dbCAN_2:Hotpep 0.9833 0.0286 0.9769 0.9897 0.8641 0.1181 0.8378 0.8904 0.9796 0.0419 0.9702 0.9889 0.9136 0.0773 0.8964 0.9308 0.9297 0.0495 0.9187 0.9407
dbCAN_3 0.9923 0.0228 0.9873 0.9974 0.9567 0.0771 0.9396 0.9739 0.9897 0.0332 0.9823 0.9971 0.9708 0.0558 0.9583 0.9832 0.9768 0.0304 0.9701 0.9836
dbCAN_3:DIAMOND 0.9844 0.0292 0.9779 0.9909 0.9760 0.0673 0.9610 0.9910 0.9830 0.0391 0.9743 0.9917 0.9776 0.0491 0.9666 0.9885 0.9812 0.0295 0.9747 0.9878
dbCAN_3:eCAMI 0.9848 0.0295 0.9782 0.9913 0.8764 0.1097 0.8520 0.9008 0.9829 0.0395 0.9741 0.9917 0.9223 0.0695 0.9069 0.9378 0.9364 0.0485 0.9256 0.9472
dbCAN_3:HMMER 0.9939 0.0224 0.9889 0.9988 0.9198 0.0830 0.9013 0.9383 0.9909 0.0330 0.9836 0.9983 0.9514 0.0596 0.9382 0.9647 0.9599 0.0331 0.9525 0.9672
dbCAN_4 0.9927 0.0240 0.9873 0.9980 0.9500 0.0802 0.9322 0.9679 0.9898 0.0337 0.9823 0.9973 0.9671 0.0590 0.9539 0.9802 0.9743 0.0307 0.9675 0.9812
dbCAN_4:dbCAN-sub 0.9924 0.0238 0.9871 0.9977 0.9473 0.0789 0.9297 0.9648 0.9896 0.0337 0.9821 0.9971 0.9655 0.0581 0.9526 0.9785 0.9731 0.0297 0.9665 0.9797
dbCAN_4:DIAMOND 0.9864 0.0279 0.9802 0.9927 0.9737 0.0763 0.9567 0.9906 0.9842 0.0393 0.9754 0.9929 0.9764 0.0583 0.9634 0.9894 0.9812 0.0304 0.9745 0.9880
dbCAN_4:HMMER 0.9939 0.0224 0.9889 0.9988 0.9202 0.0828 0.9017 0.9386 0.9909 0.0330 0.9836 0.9983 0.9516 0.0596 0.9384 0.9649 0.9601 0.0328 0.9528 0.9674
Scatter plot of sensitivity against specificity for predicting GH CAZy class members per CAZyme classier, overlaying a density map.

Figure 6.5: Scatter plot of sensitivity against specificity for predicting GH CAZy class members per CAZyme classier, overlaying a density map.

Summary statistics of CAZyme classifiers performances of GH class classification, plotting the mean plus and minus the 95% confidence interval.

Figure 6.6: Summary statistics of CAZyme classifiers performances of GH class classification, plotting the mean plus and minus the 95% confidence interval.

One dimensional scatter plot of the statistical parameters per test set for the classification of GH class members, overlaying a box plot

Figure 6.7: One dimensional scatter plot of the statistical parameters per test set for the classification of GH class members, overlaying a box plot

6.2.2 GT class classification

Table 6.4: Overall performance of CAZyme classifiers classification of GT class members
Prediction_tool Spec Mean Spec Standard Deviation Spec CI Lower Spec CI Upper Sens Mean Sens Standard Deviation Sens CI Lower Sens CI Upper Prec Mean Prec Standard Deviation Prec CI Lower Prec CI Upper F1-score Mean F1-score Standard Deviation F1-score CI Lower F1-score CI Upper Acc Mean Acc Standard Deviation Acc CI Lower Acc CI Upper
CUPP 0.9921 0.0463 0.9818 1.0024 0.8536 0.1107 0.8289 0.8782 0.9883 0.0625 0.9744 1.0022 0.9107 0.0758 0.8938 0.9276 0.9462 0.0579 0.9333 0.9591
dbCAN_2 0.9927 0.0454 0.9826 1.0028 0.8845 0.1378 0.8538 0.9152 0.9898 0.0542 0.9778 1.0019 0.9258 0.1016 0.9032 0.9484 0.9550 0.0770 0.9379 0.9721
dbCAN_2:DIAMOND 0.9919 0.0460 0.9817 1.0021 0.9255 0.1508 0.8919 0.9591 0.9886 0.0567 0.9760 1.0013 0.9463 0.1124 0.9213 0.9713 0.9672 0.0821 0.9489 0.9854
dbCAN_2:HMMER 0.9904 0.0487 0.9796 1.0012 0.8627 0.1126 0.8376 0.8877 0.9884 0.0566 0.9758 1.0010 0.9152 0.0845 0.8964 0.9340 0.9498 0.0598 0.9365 0.9631
dbCAN_2:Hotpep 0.9924 0.0421 0.9830 1.0017 0.7254 0.1807 0.6852 0.7656 0.9836 0.0688 0.9683 0.9989 0.8209 0.1383 0.7901 0.8517 0.9031 0.0908 0.8829 0.9233
dbCAN_3 0.9914 0.0474 0.9808 1.0019 0.9421 0.0971 0.9205 0.9637 0.9891 0.0563 0.9766 1.0016 0.9606 0.0790 0.9430 0.9782 0.9751 0.0585 0.9621 0.9881
dbCAN_3:DIAMOND 0.9893 0.0486 0.9784 1.0001 0.9774 0.0897 0.9574 0.9973 0.9839 0.0620 0.9701 0.9977 0.9764 0.0755 0.9596 0.9932 0.9848 0.0567 0.9722 0.9974
dbCAN_3:eCAMI 0.9922 0.0439 0.9824 1.0019 0.8500 0.1524 0.8161 0.8839 0.9881 0.0572 0.9754 1.0009 0.9046 0.1076 0.8806 0.9285 0.9413 0.0822 0.9230 0.9596
dbCAN_3:HMMER 0.9904 0.0487 0.9796 1.0012 0.8654 0.1113 0.8406 0.8901 0.9884 0.0566 0.9758 1.0010 0.9169 0.0839 0.8982 0.9355 0.9503 0.0597 0.9370 0.9635
dbCAN_4 0.9900 0.0492 0.9790 1.0009 0.9578 0.0921 0.9373 0.9783 0.9866 0.0566 0.9740 0.9992 0.9677 0.0753 0.9510 0.9845 0.9781 0.0572 0.9654 0.9909
dbCAN_4:dbCAN-sub 0.9900 0.0492 0.9790 1.0009 0.9538 0.0835 0.9352 0.9724 0.9866 0.0566 0.9740 0.9992 0.9664 0.0656 0.9518 0.9810 0.9773 0.0519 0.9657 0.9889
dbCAN_4:DIAMOND 0.9893 0.0488 0.9785 1.0002 0.9751 0.0914 0.9548 0.9954 0.9841 0.0638 0.9699 0.9983 0.9750 0.0781 0.9577 0.9924 0.9838 0.0577 0.9710 0.9966
dbCAN_4:HMMER 0.9904 0.0487 0.9796 1.0012 0.8657 0.1110 0.8410 0.8904 0.9884 0.0566 0.9758 1.0010 0.9170 0.0838 0.8984 0.9357 0.9503 0.0597 0.9370 0.9635
Scatter plot of sensitivity against specificity for predicting GT CAZy class members per CAZyme classier, overlaying a density map.

Figure 6.8: Scatter plot of sensitivity against specificity for predicting GT CAZy class members per CAZyme classier, overlaying a density map.

Summary statistics of CAZyme classifiers performances of GT class classification, plotting the mean plus and minus the 95% confidence interval.

Figure 6.9: Summary statistics of CAZyme classifiers performances of GT class classification, plotting the mean plus and minus the 95% confidence interval.

One dimensional scatter plot of the statistical parameters per test set for the classification of GT class members, overlaying a box plot

Figure 6.10: One dimensional scatter plot of the statistical parameters per test set for the classification of GT class members, overlaying a box plot

6.2.3 PL class classification

Table 6.5: Overall performance of CAZyme classifiers classification of PL class members
Prediction_tool Spec Mean Spec Standard Deviation Spec CI Lower Spec CI Upper Sens Mean Sens Standard Deviation Sens CI Lower Sens CI Upper Prec Mean Prec Standard Deviation Prec CI Lower Prec CI Upper F1-score Mean F1-score Standard Deviation F1-score CI Lower F1-score CI Upper Acc Mean Acc Standard Deviation Acc CI Lower Acc CI Upper
CUPP 0.9996 0.0019 0.9990 1.0002 0.8511 0.2593 0.7749 0.9272 0.9496 0.2058 0.8892 1.0101 0.8850 0.2288 0.8178 0.9522 0.9941 0.0096 0.9913 0.9970
dbCAN_2 0.9998 0.0012 0.9995 1.0002 0.8797 0.2421 0.8086 0.9508 0.9532 0.2052 0.8929 1.0134 0.9073 0.2183 0.8432 0.9714 0.9959 0.0070 0.9938 0.9979
dbCAN_2:DIAMOND 0.9996 0.0019 0.9991 1.0002 0.8691 0.2687 0.7902 0.9480 0.9248 0.2505 0.8513 0.9984 0.8889 0.2530 0.8146 0.9632 0.9954 0.0073 0.9933 0.9976
dbCAN_2:HMMER 0.9998 0.0012 0.9995 1.0002 0.8975 0.2125 0.8351 0.9598 0.9745 0.1481 0.9310 1.0180 0.9250 0.1788 0.8725 0.9775 0.9963 0.0062 0.9944 0.9981
dbCAN_2:Hotpep 0.9993 0.0027 0.9985 1.0001 0.8407 0.2581 0.7650 0.9165 0.9506 0.2050 0.8904 1.0108 0.8803 0.2257 0.8140 0.9465 0.9929 0.0131 0.9891 0.9968
dbCAN_3 0.9994 0.0025 0.9986 1.0001 0.9881 0.0732 0.9666 1.0096 0.9846 0.0766 0.9621 1.0071 0.9826 0.0689 0.9624 1.0028 0.9990 0.0030 0.9981 0.9999
dbCAN_3:DIAMOND 0.9989 0.0039 0.9978 1.0000 0.9881 0.0732 0.9666 1.0096 0.9730 0.1012 0.9433 1.0028 0.9754 0.0792 0.9521 0.9986 0.9985 0.0041 0.9973 0.9997
dbCAN_3:eCAMI 0.9996 0.0018 0.9991 1.0002 0.7960 0.2776 0.7154 0.8766 0.9333 0.2452 0.8621 1.0045 0.8473 0.2533 0.7737 0.9208 0.9920 0.0129 0.9883 0.9958
dbCAN_3:HMMER 0.9994 0.0025 0.9986 1.0001 0.9739 0.0982 0.9451 1.0028 0.9846 0.0766 0.9621 1.0071 0.9741 0.0781 0.9511 0.9970 0.9986 0.0035 0.9975 0.9996
dbCAN_4 0.9994 0.0025 0.9986 1.0001 0.9739 0.0982 0.9451 1.0028 0.9846 0.0766 0.9621 1.0071 0.9741 0.0781 0.9511 0.9970 0.9986 0.0035 0.9975 0.9996
dbCAN_4:dbCAN-sub 0.9994 0.0025 0.9986 1.0001 0.9752 0.0982 0.9464 1.0040 0.9846 0.0766 0.9621 1.0071 0.9747 0.0782 0.9518 0.9977 0.9988 0.0032 0.9978 0.9997
dbCAN_4:DIAMOND 0.9989 0.0039 0.9978 1.0000 0.9987 0.0086 0.9962 1.0013 0.9730 0.1012 0.9433 1.0028 0.9825 0.0646 0.9635 1.0014 0.9988 0.0039 0.9976 0.9999
dbCAN_4:HMMER 0.9994 0.0025 0.9986 1.0001 0.9739 0.0982 0.9451 1.0028 0.9846 0.0766 0.9621 1.0071 0.9741 0.0781 0.9511 0.9970 0.9986 0.0035 0.9975 0.9996
Scatter plot of sensitivity against specificity for predicting PL CAZy class members per CAZyme classier, overlaying a density map.

Figure 6.11: Scatter plot of sensitivity against specificity for predicting PL CAZy class members per CAZyme classier, overlaying a density map.

Summary statistics of CAZyme classifiers performances of PL class classification, plotting the mean plus and minus the 95% confidence interval.

Figure 6.12: Summary statistics of CAZyme classifiers performances of PL class classification, plotting the mean plus and minus the 95% confidence interval.

One dimensional scatter plot of the statistical parameters per test set for the classification of PL class members, overlaying a box plot

Figure 6.13: One dimensional scatter plot of the statistical parameters per test set for the classification of PL class members, overlaying a box plot

6.2.4 CE class classification

Table 6.6: Overall performance of CAZyme classifiers classification of CE class members
Prediction_tool Spec Mean Spec Standard Deviation Spec CI Lower Spec CI Upper Sens Mean Sens Standard Deviation Sens CI Lower Sens CI Upper Prec Mean Prec Standard Deviation Prec CI Lower Prec CI Upper F1-score Mean F1-score Standard Deviation F1-score CI Lower F1-score CI Upper Acc Mean Acc Standard Deviation Acc CI Lower Acc CI Upper
CUPP 0.9955 0.0178 0.9914 0.9995 0.9114 0.1332 0.8810 0.9419 0.9598 0.1130 0.9339 0.9856 0.9250 0.1095 0.9000 0.9500 0.9900 0.0187 0.9857 0.9943
dbCAN_2 0.9937 0.0220 0.9887 0.9988 0.9213 0.1433 0.8886 0.9540 0.9519 0.1408 0.9197 0.9840 0.9224 0.1305 0.8926 0.9522 0.9893 0.0225 0.9842 0.9945
dbCAN_2:DIAMOND 0.9941 0.0210 0.9893 0.9989 0.8480 0.2469 0.7915 0.9044 0.9280 0.2089 0.8803 0.9757 0.8636 0.2234 0.8126 0.9147 0.9862 0.0242 0.9807 0.9918
dbCAN_2:HMMER 0.9945 0.0187 0.9902 0.9988 0.9208 0.1363 0.8896 0.9519 0.9525 0.1191 0.9253 0.9797 0.9248 0.1119 0.8993 0.9504 0.9901 0.0189 0.9858 0.9945
dbCAN_2:Hotpep 0.9905 0.0229 0.9852 0.9957 0.8508 0.2065 0.8036 0.8980 0.9128 0.1671 0.8746 0.9510 0.8555 0.1705 0.8165 0.8944 0.9821 0.0251 0.9764 0.9878
dbCAN_3 0.9936 0.0224 0.9885 0.9988 0.9283 0.1537 0.8932 0.9634 0.9526 0.1391 0.9208 0.9844 0.9236 0.1397 0.8916 0.9555 0.9896 0.0220 0.9846 0.9946
dbCAN_3:DIAMOND 0.9925 0.0234 0.9872 0.9979 0.9303 0.1704 0.8913 0.9692 0.9304 0.1864 0.8878 0.9730 0.9152 0.1682 0.8768 0.9537 0.9890 0.0231 0.9837 0.9943
dbCAN_3:eCAMI 0.9926 0.0213 0.9877 0.9975 0.8073 0.2445 0.7514 0.8632 0.9156 0.1909 0.8719 0.9592 0.8314 0.2020 0.7852 0.8775 0.9818 0.0249 0.9761 0.9875
dbCAN_3:HMMER 0.9953 0.0179 0.9912 0.9994 0.9230 0.1335 0.8925 0.9535 0.9602 0.1103 0.9350 0.9854 0.9302 0.1073 0.9057 0.9547 0.9910 0.0177 0.9870 0.9951
dbCAN_4 0.9953 0.0179 0.9912 0.9994 0.9783 0.0669 0.9630 0.9935 0.9603 0.1103 0.9351 0.9855 0.9638 0.0854 0.9443 0.9833 0.9943 0.0172 0.9904 0.9982
dbCAN_4:dbCAN-sub 0.9952 0.0181 0.9911 0.9993 0.9755 0.0680 0.9599 0.9910 0.9592 0.1120 0.9336 0.9848 0.9617 0.0863 0.9420 0.9814 0.9938 0.0175 0.9898 0.9978
dbCAN_4:DIAMOND 0.9931 0.0227 0.9879 0.9983 0.9755 0.0851 0.9560 0.9949 0.9476 0.1506 0.9132 0.9820 0.9507 0.1192 0.9234 0.9779 0.9919 0.0218 0.9870 0.9969
dbCAN_4:HMMER 0.9953 0.0179 0.9912 0.9994 0.9529 0.1012 0.9298 0.9760 0.9603 0.1103 0.9351 0.9855 0.9487 0.0957 0.9268 0.9706 0.9929 0.0173 0.9889 0.9968
Scatter plot of sensitivity against specificity for predicting CE CAZy class members per CAZyme classier, overlaying a density map.

Figure 6.14: Scatter plot of sensitivity against specificity for predicting CE CAZy class members per CAZyme classier, overlaying a density map.

Summary statistics of CAZyme classifiers performances of CE class classification, plotting the mean plus and minus the 95% confidence interval.

Figure 6.15: Summary statistics of CAZyme classifiers performances of CE class classification, plotting the mean plus and minus the 95% confidence interval.

One dimensional scatter plot of the statistical parameters per test set for the classification of CE class members, overlaying a box plot

Figure 6.16: One dimensional scatter plot of the statistical parameters per test set for the classification of CE class members, overlaying a box plot

6.2.5 AA class classification

Table 6.7: Overall performance of CAZyme classifiers classification of AA class members
Prediction_tool Spec Mean Spec Standard Deviation Spec CI Lower Spec CI Upper Sens Mean Sens Standard Deviation Sens CI Lower Sens CI Upper Prec Mean Prec Standard Deviation Prec CI Lower Prec CI Upper F1-score Mean F1-score Standard Deviation F1-score CI Lower F1-score CI Upper Acc Mean Acc Standard Deviation Acc CI Lower Acc CI Upper
CUPP 0.9919 0.0175 0.9868 0.9970 0.9047 0.1300 0.8670 0.9425 0.9184 0.1697 0.8691 0.9677 0.9011 0.1376 0.8611 0.9410 0.9841 0.0215 0.9778 0.9903
dbCAN_2 0.9919 0.0178 0.9868 0.9971 0.9190 0.1152 0.8855 0.9524 0.9208 0.1693 0.8716 0.9699 0.9104 0.1322 0.8720 0.9488 0.9856 0.0204 0.9796 0.9915
dbCAN_2:DIAMOND 0.9922 0.0178 0.9870 0.9974 0.8693 0.1786 0.8174 0.9212 0.9228 0.1710 0.8731 0.9724 0.8766 0.1603 0.8301 0.9232 0.9837 0.0212 0.9775 0.9899
dbCAN_2:HMMER 0.9913 0.0187 0.9859 0.9967 0.9375 0.0917 0.9109 0.9641 0.9191 0.1607 0.8724 0.9658 0.9173 0.1137 0.8843 0.9503 0.9856 0.0202 0.9797 0.9914
dbCAN_2:Hotpep 0.9920 0.0180 0.9867 0.9972 0.8737 0.1942 0.8173 0.9301 0.8983 0.2166 0.8354 0.9612 0.8733 0.1924 0.8174 0.9292 0.9827 0.0227 0.9761 0.9893
dbCAN_3 0.9911 0.0199 0.9853 0.9969 0.9835 0.0450 0.9705 0.9966 0.9236 0.1605 0.8770 0.9702 0.9436 0.1100 0.9117 0.9756 0.9901 0.0188 0.9846 0.9955
dbCAN_3:DIAMOND 0.9900 0.0219 0.9837 0.9964 0.9884 0.0359 0.9779 0.9988 0.9171 0.1693 0.8679 0.9662 0.9413 0.1144 0.9080 0.9745 0.9897 0.0200 0.9839 0.9955
dbCAN_3:eCAMI 0.9921 0.0167 0.9873 0.9970 0.7821 0.2611 0.7063 0.8579 0.8514 0.2809 0.7699 0.9330 0.8006 0.2558 0.7263 0.8749 0.9791 0.0233 0.9724 0.9859
dbCAN_3:HMMER 0.9907 0.0202 0.9848 0.9965 0.9880 0.0393 0.9766 0.9994 0.9192 0.1624 0.8720 0.9664 0.9434 0.1112 0.9111 0.9757 0.9901 0.0196 0.9844 0.9958
dbCAN_4 0.9907 0.0201 0.9848 0.9965 0.9937 0.0258 0.9862 1.0012 0.9198 0.1610 0.8731 0.9666 0.9464 0.1085 0.9149 0.9779 0.9907 0.0188 0.9853 0.9962
dbCAN_4:dbCAN-sub 0.9905 0.0202 0.9846 0.9963 0.9892 0.0331 0.9796 0.9988 0.9177 0.1620 0.8707 0.9648 0.9430 0.1090 0.9113 0.9746 0.9902 0.0189 0.9847 0.9956
dbCAN_4:DIAMOND 0.9903 0.0219 0.9839 0.9966 0.9700 0.0952 0.9423 0.9976 0.9196 0.1697 0.8703 0.9689 0.9306 0.1226 0.8950 0.9662 0.9897 0.0198 0.9840 0.9954
dbCAN_4:HMMER 0.9907 0.0202 0.9848 0.9965 0.9880 0.0393 0.9766 0.9994 0.9192 0.1624 0.8720 0.9664 0.9434 0.1112 0.9111 0.9757 0.9901 0.0196 0.9844 0.9958
Scatter plot of sensitivity against specificity for predicting AA CAZy class members per CAZyme classier, overlaying a density map.

Figure 6.17: Scatter plot of sensitivity against specificity for predicting AA CAZy class members per CAZyme classier, overlaying a density map.

Summary statistics of CAZyme classifiers performances of AA class classification, plotting the mean plus and minus the 95% confidence interval.

Figure 6.18: Summary statistics of CAZyme classifiers performances of AA class classification, plotting the mean plus and minus the 95% confidence interval.

One dimensional scatter plot of the statistical parameters per test set for the classification of AA class members, overlaying a box plot

Figure 6.19: One dimensional scatter plot of the statistical parameters per test set for the classification of AA class members, overlaying a box plot

6.2.6 CBM class classification

Table 6.8: Overall performance of CAZyme classifiers classification of CBM class members
Prediction_tool Spec Mean Spec Standard Deviation Spec CI Lower Spec CI Upper Sens Mean Sens Standard Deviation Sens CI Lower Sens CI Upper Prec Mean Prec Standard Deviation Prec CI Lower Prec CI Upper F1-score Mean F1-score Standard Deviation F1-score CI Lower F1-score CI Upper Acc Mean Acc Standard Deviation Acc CI Lower Acc CI Upper
CUPP 1.0000 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.8851 0.0851 0.8662 0.9040
dbCAN_2 0.9922 0.0119 0.9896 0.9948 0.7242 0.1925 0.6813 0.7670 0.9203 0.1333 0.8906 0.9500 0.7970 0.1542 0.7627 0.8314 0.9656 0.0257 0.9599 0.9713
dbCAN_2:DIAMOND 0.9914 0.0149 0.9881 0.9947 0.7854 0.2191 0.7367 0.8342 0.9219 0.1502 0.8885 0.9553 0.8325 0.1762 0.7933 0.8717 0.9719 0.0252 0.9663 0.9775
dbCAN_2:HMMER 0.9961 0.0088 0.9941 0.9980 0.4129 0.2165 0.3648 0.4611 0.8965 0.2304 0.8452 0.9478 0.5414 0.2150 0.4935 0.5892 0.9378 0.0296 0.9312 0.9444
dbCAN_2:Hotpep 0.9014 0.0511 0.8901 0.9128 0.7061 0.2102 0.6594 0.7529 0.4696 0.1731 0.4311 0.5081 0.5508 0.1723 0.5125 0.5892 0.8823 0.0551 0.8701 0.8946
dbCAN_3 0.9943 0.0109 0.9919 0.9968 0.7680 0.1926 0.7252 0.8109 0.9431 0.1355 0.9129 0.9732 0.8354 0.1546 0.8010 0.8698 0.9718 0.0228 0.9667 0.9768
dbCAN_3:DIAMOND 0.9931 0.0144 0.9899 0.9963 0.8675 0.1731 0.8290 0.9060 0.9512 0.1283 0.9227 0.9797 0.8986 0.1383 0.8678 0.9294 0.9814 0.0185 0.9773 0.9855
dbCAN_3:eCAMI 0.9470 0.0523 0.9354 0.9587 0.7346 0.2203 0.6856 0.7836 0.6615 0.2254 0.6113 0.7117 0.6810 0.2079 0.6347 0.7272 0.9267 0.0565 0.9141 0.9392
dbCAN_3:HMMER 0.9960 0.0088 0.9941 0.9980 0.4135 0.2148 0.3657 0.4613 0.8964 0.2304 0.8451 0.9476 0.5425 0.2134 0.4950 0.5900 0.9379 0.0300 0.9313 0.9446
dbCAN_4 0.9951 0.0103 0.9928 0.9974 0.7995 0.1921 0.7567 0.8422 0.9562 0.0919 0.9358 0.9767 0.8547 0.1392 0.8237 0.8857 0.9763 0.0207 0.9717 0.9809
dbCAN_4:dbCAN-sub 0.9927 0.0129 0.9898 0.9955 0.8193 0.1995 0.7749 0.8637 0.9376 0.1151 0.9120 0.9632 0.8576 0.1492 0.8244 0.8908 0.9761 0.0244 0.9707 0.9816
dbCAN_4:DIAMOND 0.9938 0.0146 0.9906 0.9970 0.8773 0.1695 0.8396 0.9151 0.9571 0.1273 0.9288 0.9854 0.9072 0.1362 0.8768 0.9375 0.9830 0.0182 0.9790 0.9870
dbCAN_4:HMMER 0.9958 0.0097 0.9937 0.9980 0.4740 0.2360 0.4215 0.5266 0.9184 0.1999 0.8739 0.9629 0.5954 0.2202 0.5464 0.6444 0.9436 0.0300 0.9370 0.9503
Scatter plot of sensitivity against specificity for predicting CBM CAZy class members per CAZyme classier, overlaying a density map.

Figure 6.20: Scatter plot of sensitivity against specificity for predicting CBM CAZy class members per CAZyme classier, overlaying a density map.

Summary statistics of CAZyme classifiers performances of CBM class classification, plotting the mean plus and minus the 95% confidence interval.

Figure 6.21: Summary statistics of CAZyme classifiers performances of CBM class classification, plotting the mean plus and minus the 95% confidence interval.

One dimensional scatter plot of the statistical parameters per test set for the classification of CBM class members, overlaying a box plot

Figure 6.22: One dimensional scatter plot of the statistical parameters per test set for the classification of CBM class members, overlaying a box plot

6.3 Performance per statistic

Instead of facet wrapping the plots by statistic and producing a plot per CAZy class, we can produce a plot per statistic (sensitivity, precision, etc.) and facet wrap by CAZy class to facilitate comparing between CAZy classes, and evaluting the performance statistic by statistic.

6.3.1 CAZy class Specificity

6.3.2 CAZy class sensitivity

6.3.3 CAZy class precision

6.3.4 CAZy class F1-score

6.3.5 CAZy class accuracy

7 CAZy class multilabel classification

A single CAZyme can be included in multiple CAZy classes leading to the multilabel classification of CAZymes. To address this and evaluate the multilabel classification of CAZy classes the Rand Index (RI) and Adjusted Rand Index (ARI) were calculated.

The RI is the measure of accuracy across all potential classifications of a protein. The RI ranges from 0 (no correct annotations) to 1 (all annotations correct). The ARI is the RI adjusted for chance, where 0 is the equivalent to assigning the CAZy class annotations randomly, -1 where the annotations are systematically handed out incorrectly and 1 where the annotations are all correct.

8 CAZy class taxonomic performance

8.1 Across all of CAZy

8.1.1 Specificity

Table 8.1: Overall performance (represented by the Specificity) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation Lower CI Upper CI Tax_group
dbCAN_2 0.9933 0.0259 0.9908 0.9958 All
dbCAN_2:HMMER 0.9939 0.0264 0.9913 0.9964 All
dbCAN_2:DIAMOND 0.9920 0.0270 0.9894 0.9946 All
dbCAN_2:Hotpep 0.9733 0.0491 0.9685 0.9781 All
dbCAN_3 0.9934 0.0265 0.9909 0.9960 All
dbCAN_3:HMMER 0.9941 0.0261 0.9916 0.9967 All
dbCAN_3:DIAMOND 0.9909 0.0289 0.9881 0.9937 All
dbCAN_3:eCAMI 0.9829 0.0388 0.9791 0.9867 All
dbCAN_4 0.9936 0.0267 0.9910 0.9962 All
dbCAN_4:HMMER 0.9941 0.0262 0.9915 0.9966 All
dbCAN_4:DIAMOND 0.9915 0.0285 0.9888 0.9943 All
dbCAN_4:dbCAN-sub 0.9931 0.0269 0.9905 0.9957 All
CUPP 0.9953 0.0246 0.9929 0.9977 All
dbCAN_2 0.9951 0.0157 0.9930 0.9973 Bacteria
dbCAN_2:HMMER 0.9959 0.0139 0.9939 0.9978 Bacteria
dbCAN_2:DIAMOND 0.9932 0.0185 0.9907 0.9958 Bacteria
dbCAN_2:Hotpep 0.9695 0.0509 0.9625 0.9766 Bacteria
dbCAN_3 0.9950 0.0160 0.9928 0.9973 Bacteria
dbCAN_3:HMMER 0.9962 0.0133 0.9944 0.9981 Bacteria
dbCAN_3:DIAMOND 0.9930 0.0191 0.9904 0.9957 Bacteria
dbCAN_3:eCAMI 0.9798 0.0392 0.9743 0.9852 Bacteria
dbCAN_4 0.9958 0.0137 0.9939 0.9977 Bacteria
dbCAN_4:HMMER 0.9962 0.0133 0.9944 0.9981 Bacteria
dbCAN_4:DIAMOND 0.9942 0.0173 0.9918 0.9966 Bacteria
dbCAN_4:dbCAN-sub 0.9954 0.0139 0.9935 0.9973 Bacteria
CUPP 0.9970 0.0128 0.9952 0.9988 Bacteria
dbCAN_2 0.9916 0.0328 0.9871 0.9961 Eukaryote
dbCAN_2:HMMER 0.9920 0.0343 0.9873 0.9967 Eukaryote
dbCAN_2:DIAMOND 0.9909 0.0332 0.9864 0.9954 Eukaryote
dbCAN_2:Hotpep 0.9769 0.0472 0.9705 0.9833 Eukaryote
dbCAN_3 0.9919 0.0335 0.9874 0.9965 Eukaryote
dbCAN_3:HMMER 0.9921 0.0341 0.9875 0.9968 Eukaryote
dbCAN_3:DIAMOND 0.9888 0.0357 0.9839 0.9936 Eukaryote
dbCAN_3:eCAMI 0.9859 0.0382 0.9807 0.9911 Eukaryote
dbCAN_4 0.9916 0.0348 0.9868 0.9963 Eukaryote
dbCAN_4:HMMER 0.9920 0.0341 0.9874 0.9967 Eukaryote
dbCAN_4:DIAMOND 0.9890 0.0360 0.9841 0.9939 Eukaryote
dbCAN_4:dbCAN-sub 0.9908 0.0350 0.9861 0.9956 Eukaryote
CUPP 0.9937 0.0320 0.9894 0.9981 Eukaryote

8.1.2 Sensitivity

Table 8.2: Overall performance (represented by the Sensitivity) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation Lower CI Upper CI Tax_group
dbCAN_2 0.8830 0.1710 0.8592 0.9068 Bacteria
dbCAN_2:HMMER 0.8194 0.2217 0.7885 0.8502 Bacteria
dbCAN_2:DIAMOND 0.8889 0.1819 0.8636 0.9142 Bacteria
dbCAN_2:Hotpep 0.8176 0.2115 0.7882 0.8470 Bacteria
dbCAN_3 0.9128 0.1608 0.8905 0.9352 Bacteria
dbCAN_3:HMMER 0.8256 0.2214 0.7948 0.8564 Bacteria
dbCAN_3:DIAMOND 0.9334 0.1427 0.9136 0.9533 Bacteria
dbCAN_3:eCAMI 0.8435 0.2067 0.8148 0.8722 Bacteria
dbCAN_4 0.9477 0.1076 0.9327 0.9626 Bacteria
dbCAN_4:HMMER 0.8545 0.1946 0.8275 0.8816 Bacteria
dbCAN_4:DIAMOND 0.9317 0.1461 0.9114 0.9520 Bacteria
dbCAN_4:dbCAN-sub 0.9558 0.1011 0.9417 0.9699 Bacteria
CUPP 0.7168 0.3818 0.6637 0.7699 Bacteria
dbCAN_2 0.8644 0.1754 0.8405 0.8882 Eukaryote
dbCAN_2:HMMER 0.7960 0.2670 0.7596 0.8323 Eukaryote
dbCAN_2:DIAMOND 0.8583 0.2229 0.8280 0.8887 Eukaryote
dbCAN_2:Hotpep 0.7877 0.1978 0.7608 0.8146 Eukaryote
dbCAN_3 0.9241 0.1291 0.9066 0.9417 Eukaryote
dbCAN_3:HMMER 0.8252 0.2649 0.7892 0.8613 Eukaryote
dbCAN_3:DIAMOND 0.9649 0.1093 0.9500 0.9798 Eukaryote
dbCAN_3:eCAMI 0.7790 0.2176 0.7494 0.8086 Eukaryote
dbCAN_4 0.9234 0.1457 0.9036 0.9433 Eukaryote
dbCAN_4:HMMER 0.8317 0.2562 0.7968 0.8665 Eukaryote
dbCAN_4:DIAMOND 0.9831 0.0487 0.9765 0.9898 Eukaryote
dbCAN_4:dbCAN-sub 0.9189 0.1433 0.8994 0.9384 Eukaryote
CUPP 0.7121 0.3682 0.6620 0.7622 Eukaryote

8.1.3 Precision

Table 8.3: Overall performance (represented by the Precision) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation Lower CI Upper CI Tax_group
dbCAN_2 0.9664 0.1215 0.9495 0.9833 Bacteria
dbCAN_2:HMMER 0.9706 0.0968 0.9571 0.9840 Bacteria
dbCAN_2:DIAMOND 0.9647 0.1349 0.9459 0.9834 Bacteria
dbCAN_2:Hotpep 0.8551 0.2420 0.8214 0.8887 Bacteria
dbCAN_3 0.9686 0.1160 0.9525 0.9848 Bacteria
dbCAN_3:HMMER 0.9725 0.0929 0.9596 0.9854 Bacteria
dbCAN_3:DIAMOND 0.9712 0.1113 0.9557 0.9867 Bacteria
dbCAN_3:eCAMI 0.8865 0.2140 0.8568 0.9162 Bacteria
dbCAN_4 0.9783 0.0746 0.9680 0.9887 Bacteria
dbCAN_4:HMMER 0.9755 0.0855 0.9636 0.9873 Bacteria
dbCAN_4:DIAMOND 0.9734 0.1103 0.9580 0.9887 Bacteria
dbCAN_4:dbCAN-sub 0.9767 0.0751 0.9663 0.9872 Bacteria
CUPP 0.7806 0.4022 0.7246 0.8365 Bacteria
dbCAN_2 0.9476 0.1358 0.9291 0.9661 Eukaryote
dbCAN_2:HMMER 0.9390 0.1748 0.9152 0.9628 Eukaryote
dbCAN_2:DIAMOND 0.9333 0.1738 0.9097 0.9569 Eukaryote
dbCAN_2:Hotpep 0.8568 0.2458 0.8233 0.8902 Eukaryote
dbCAN_3 0.9622 0.1043 0.9480 0.9764 Eukaryote
dbCAN_3:HMMER 0.9427 0.1653 0.9202 0.9651 Eukaryote
dbCAN_3:DIAMOND 0.9462 0.1359 0.9277 0.9646 Eukaryote
dbCAN_3:eCAMI 0.8894 0.2221 0.8592 0.9196 Eukaryote
dbCAN_4 0.9589 0.1086 0.9442 0.9737 Eukaryote
dbCAN_4:HMMER 0.9483 0.1511 0.9277 0.9688 Eukaryote
dbCAN_4:DIAMOND 0.9537 0.1191 0.9375 0.9699 Eukaryote
dbCAN_4:dbCAN-sub 0.9524 0.1182 0.9364 0.9685 Eukaryote
CUPP 0.7765 0.3957 0.7227 0.8304 Eukaryote

8.1.4 F1-score

Table 8.4: Overall performance (represented by the F1-score) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation Lower CI Upper CI Tax_group
dbCAN_2 0.9120 0.1418 0.8923 0.9317 Bacteria
dbCAN_2:HMMER 0.8680 0.1741 0.8438 0.8922 Bacteria
dbCAN_2:DIAMOND 0.9136 0.1536 0.8922 0.9349 Bacteria
dbCAN_2:Hotpep 0.8140 0.2090 0.7850 0.8431 Bacteria
dbCAN_3 0.9298 0.1366 0.9108 0.9488 Bacteria
dbCAN_3:HMMER 0.8729 0.1729 0.8489 0.8970 Bacteria
dbCAN_3:DIAMOND 0.9434 0.1209 0.9266 0.9602 Bacteria
dbCAN_3:eCAMI 0.8506 0.1978 0.8232 0.8781 Bacteria
dbCAN_4 0.9567 0.0853 0.9449 0.9686 Bacteria
dbCAN_4:HMMER 0.8949 0.1487 0.8742 0.9156 Bacteria
dbCAN_4:DIAMOND 0.9432 0.1237 0.9260 0.9604 Bacteria
dbCAN_4:dbCAN-sub 0.9607 0.0815 0.9493 0.9720 Bacteria
CUPP 0.7409 0.3859 0.6872 0.7946 Bacteria
dbCAN_2 0.8934 0.1458 0.8736 0.9133 Eukaryote
dbCAN_2:HMMER 0.8353 0.2285 0.8042 0.8663 Eukaryote
dbCAN_2:DIAMOND 0.8794 0.1936 0.8531 0.9057 Eukaryote
dbCAN_2:Hotpep 0.7977 0.2034 0.7700 0.8253 Eukaryote
dbCAN_3 0.9339 0.1028 0.9199 0.9478 Eukaryote
dbCAN_3:HMMER 0.8532 0.2236 0.8227 0.8836 Eukaryote
dbCAN_3:DIAMOND 0.9484 0.1137 0.9329 0.9638 Eukaryote
dbCAN_3:eCAMI 0.8147 0.2055 0.7867 0.8426 Eukaryote
dbCAN_4 0.9301 0.1179 0.9140 0.9461 Eukaryote
dbCAN_4:HMMER 0.8591 0.2146 0.8299 0.8883 Eukaryote
dbCAN_4:DIAMOND 0.9629 0.0818 0.9517 0.9740 Eukaryote
dbCAN_4:dbCAN-sub 0.9249 0.1207 0.9085 0.9413 Eukaryote
CUPP 0.7370 0.3750 0.6860 0.7880 Eukaryote

8.1.5 Accuracy

Table 8.5: Overall performance (represented by the Accuracy) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation Lower CI Upper CI Tax_group
dbCAN_2 0.9747 0.0436 0.9687 0.9808 Bacteria
dbCAN_2:HMMER 0.9646 0.0478 0.9580 0.9713 Bacteria
dbCAN_2:DIAMOND 0.9766 0.0455 0.9703 0.9830 Bacteria
dbCAN_2:Hotpep 0.9366 0.0664 0.9274 0.9458 Bacteria
dbCAN_3 0.9813 0.0408 0.9757 0.9870 Bacteria
dbCAN_3:HMMER 0.9658 0.0480 0.9591 0.9725 Bacteria
dbCAN_3:DIAMOND 0.9844 0.0387 0.9790 0.9898 Bacteria
dbCAN_3:eCAMI 0.9578 0.0539 0.9503 0.9653 Bacteria
dbCAN_4 0.9855 0.0392 0.9801 0.9910 Bacteria
dbCAN_4:HMMER 0.9684 0.0473 0.9618 0.9750 Bacteria
dbCAN_4:DIAMOND 0.9839 0.0400 0.9784 0.9895 Bacteria
dbCAN_4:dbCAN-sub 0.9864 0.0359 0.9814 0.9914 Bacteria
CUPP 0.9516 0.0743 0.9413 0.9620 Bacteria
dbCAN_2 0.9728 0.0431 0.9670 0.9787 Eukaryote
dbCAN_2:HMMER 0.9656 0.0332 0.9611 0.9702 Eukaryote
dbCAN_2:DIAMOND 0.9759 0.0448 0.9698 0.9820 Eukaryote
dbCAN_2:Hotpep 0.9401 0.0698 0.9306 0.9496 Eukaryote
dbCAN_3 0.9825 0.0259 0.9790 0.9860 Eukaryote
dbCAN_3:HMMER 0.9690 0.0331 0.9645 0.9735 Eukaryote
dbCAN_3:DIAMOND 0.9882 0.0241 0.9850 0.9915 Eukaryote
dbCAN_3:eCAMI 0.9528 0.0587 0.9449 0.9608 Eukaryote
dbCAN_4 0.9822 0.0252 0.9788 0.9857 Eukaryote
dbCAN_4:HMMER 0.9695 0.0325 0.9650 0.9739 Eukaryote
dbCAN_4:DIAMOND 0.9900 0.0226 0.9870 0.9931 Eukaryote
dbCAN_4:dbCAN-sub 0.9803 0.0264 0.9767 0.9839 Eukaryote
CUPP 0.9559 0.0466 0.9496 0.9623 Eukaryote

8.2 Per CAZy class

8.2.1 Specificity

Table 8.6: Overall performance (represented by the Specificity) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation LowerCI UpperCI Tax_group CAZy_class
dbCAN_2 0.9917 0.0245 0.9862 0.9972 All GH
dbCAN_2:HMMER 0.9927 0.0242 0.9873 0.9981 All GH
dbCAN_2:DIAMOND 0.9862 0.0276 0.9801 0.9924 All GH
dbCAN_2:Hotpep 0.9833 0.0286 0.9769 0.9897 All GH
dbCAN_3 0.9923 0.0228 0.9873 0.9974 All GH
dbCAN_3:HMMER 0.9939 0.0224 0.9889 0.9988 All GH
dbCAN_3:DIAMOND 0.9844 0.0292 0.9779 0.9909 All GH
dbCAN_3:eCAMI 0.9848 0.0295 0.9782 0.9913 All GH
dbCAN_4 0.9927 0.0240 0.9873 0.9980 All GH
dbCAN_4:HMMER 0.9939 0.0224 0.9889 0.9988 All GH
dbCAN_4:DIAMOND 0.9864 0.0279 0.9802 0.9927 All GH
dbCAN_4:dbCAN-sub 0.9924 0.0238 0.9871 0.9977 All GH
CUPP 0.9933 0.0212 0.9886 0.9980 All GH
dbCAN_2 0.9932 0.0144 0.9886 0.9978 Bacteria GH
dbCAN_2:HMMER 0.9945 0.0133 0.9902 0.9987 Bacteria GH
dbCAN_2:DIAMOND 0.9844 0.0230 0.9770 0.9918 Bacteria GH
dbCAN_2:Hotpep 0.9753 0.0283 0.9662 0.9843 Bacteria GH
dbCAN_3 0.9932 0.0137 0.9888 0.9975 Bacteria GH
dbCAN_3:HMMER 0.9954 0.0124 0.9914 0.9993 Bacteria GH
dbCAN_3:DIAMOND 0.9846 0.0239 0.9769 0.9922 Bacteria GH
dbCAN_3:eCAMI 0.9794 0.0278 0.9705 0.9883 Bacteria GH
dbCAN_4 0.9933 0.0141 0.9888 0.9978 Bacteria GH
dbCAN_4:HMMER 0.9954 0.0124 0.9914 0.9993 Bacteria GH
dbCAN_4:DIAMOND 0.9891 0.0196 0.9829 0.9954 Bacteria GH
dbCAN_4:dbCAN-sub 0.9927 0.0143 0.9881 0.9973 Bacteria GH
CUPP 0.9935 0.0140 0.9890 0.9980 Bacteria GH
dbCAN_2 0.9902 0.0318 0.9801 1.0004 Eukaryote GH
dbCAN_2:HMMER 0.9909 0.0317 0.9808 1.0010 Eukaryote GH
dbCAN_2:DIAMOND 0.9881 0.0318 0.9779 0.9982 Eukaryote GH
dbCAN_2:Hotpep 0.9913 0.0269 0.9827 0.9999 Eukaryote GH
dbCAN_3 0.9915 0.0295 0.9821 1.0009 Eukaryote GH
dbCAN_3:HMMER 0.9923 0.0293 0.9830 1.0017 Eukaryote GH
dbCAN_3:DIAMOND 0.9843 0.0339 0.9734 0.9951 Eukaryote GH
dbCAN_3:eCAMI 0.9901 0.0306 0.9803 0.9998 Eukaryote GH
dbCAN_4 0.9920 0.0312 0.9821 1.0020 Eukaryote GH
dbCAN_4:HMMER 0.9923 0.0293 0.9830 1.0017 Eukaryote GH
dbCAN_4:DIAMOND 0.9837 0.0344 0.9727 0.9947 Eukaryote GH
dbCAN_4:dbCAN-sub 0.9921 0.0308 0.9823 1.0019 Eukaryote GH
CUPP 0.9931 0.0268 0.9845 1.0016 Eukaryote GH
dbCAN_2 0.9927 0.0454 0.9826 1.0028 All GT
dbCAN_2:HMMER 0.9904 0.0487 0.9796 1.0012 All GT
dbCAN_2:DIAMOND 0.9919 0.0460 0.9817 1.0021 All GT
dbCAN_2:Hotpep 0.9924 0.0421 0.9830 1.0017 All GT
dbCAN_3 0.9914 0.0474 0.9808 1.0019 All GT
dbCAN_3:HMMER 0.9904 0.0487 0.9796 1.0012 All GT
dbCAN_3:DIAMOND 0.9893 0.0486 0.9784 1.0001 All GT
dbCAN_3:eCAMI 0.9922 0.0439 0.9824 1.0019 All GT
dbCAN_4 0.9900 0.0492 0.9790 1.0009 All GT
dbCAN_4:HMMER 0.9904 0.0487 0.9796 1.0012 All GT
dbCAN_4:DIAMOND 0.9893 0.0488 0.9785 1.0002 All GT
dbCAN_4:dbCAN-sub 0.9900 0.0492 0.9790 1.0009 All GT
CUPP 0.9921 0.0463 0.9818 1.0024 All GT
dbCAN_2 0.9991 0.0041 0.9978 1.0004 Bacteria GT
dbCAN_2:HMMER 0.9990 0.0044 0.9976 1.0004 Bacteria GT
dbCAN_2:DIAMOND 0.9986 0.0070 0.9963 1.0008 Bacteria GT
dbCAN_2:Hotpep 0.9976 0.0060 0.9957 0.9995 Bacteria GT
dbCAN_3 0.9991 0.0042 0.9977 1.0004 Bacteria GT
dbCAN_3:HMMER 0.9990 0.0044 0.9976 1.0004 Bacteria GT
dbCAN_3:DIAMOND 0.9983 0.0073 0.9959 1.0006 Bacteria GT
dbCAN_3:eCAMI 0.9976 0.0083 0.9949 1.0003 Bacteria GT
dbCAN_4 0.9987 0.0047 0.9972 1.0002 Bacteria GT
dbCAN_4:HMMER 0.9990 0.0044 0.9976 1.0004 Bacteria GT
dbCAN_4:DIAMOND 0.9983 0.0073 0.9959 1.0006 Bacteria GT
dbCAN_4:dbCAN-sub 0.9987 0.0047 0.9972 1.0002 Bacteria GT
CUPP 0.9994 0.0039 0.9982 1.0006 Bacteria GT
dbCAN_2 0.9864 0.0638 0.9659 1.0068 Eukaryote GT
dbCAN_2:HMMER 0.9818 0.0680 0.9600 1.0035 Eukaryote GT
dbCAN_2:DIAMOND 0.9852 0.0644 0.9646 1.0058 Eukaryote GT
dbCAN_2:Hotpep 0.9871 0.0592 0.9682 1.0060 Eukaryote GT
dbCAN_3 0.9837 0.0664 0.9625 1.0050 Eukaryote GT
dbCAN_3:HMMER 0.9818 0.0680 0.9600 1.0035 Eukaryote GT
dbCAN_3:DIAMOND 0.9803 0.0676 0.9587 1.0019 Eukaryote GT
dbCAN_3:eCAMI 0.9867 0.0614 0.9671 1.0064 Eukaryote GT
dbCAN_4 0.9812 0.0687 0.9592 1.0032 Eukaryote GT
dbCAN_4:HMMER 0.9818 0.0680 0.9600 1.0035 Eukaryote GT
dbCAN_4:DIAMOND 0.9804 0.0678 0.9587 1.0021 Eukaryote GT
dbCAN_4:dbCAN-sub 0.9812 0.0687 0.9592 1.0032 Eukaryote GT
CUPP 0.9848 0.0649 0.9640 1.0056 Eukaryote GT
dbCAN_2 0.9998 0.0012 0.9995 1.0002 All PL
dbCAN_2:HMMER 0.9998 0.0012 0.9995 1.0002 All PL
dbCAN_2:DIAMOND 0.9996 0.0019 0.9991 1.0002 All PL
dbCAN_2:Hotpep 0.9993 0.0027 0.9985 1.0001 All PL
dbCAN_3 0.9994 0.0025 0.9986 1.0001 All PL
dbCAN_3:HMMER 0.9994 0.0025 0.9986 1.0001 All PL
dbCAN_3:DIAMOND 0.9989 0.0039 0.9978 1.0000 All PL
dbCAN_3:eCAMI 0.9996 0.0018 0.9991 1.0002 All PL
dbCAN_4 0.9994 0.0025 0.9986 1.0001 All PL
dbCAN_4:HMMER 0.9994 0.0025 0.9986 1.0001 All PL
dbCAN_4:DIAMOND 0.9989 0.0039 0.9978 1.0000 All PL
dbCAN_4:dbCAN-sub 0.9994 0.0025 0.9986 1.0001 All PL
CUPP 0.9996 0.0019 0.9990 1.0002 All PL
dbCAN_2 1.0000 0.0000 1.0000 1.0000 Bacteria PL
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria PL
dbCAN_2:DIAMOND 0.9997 0.0018 0.9990 1.0003 Bacteria PL
dbCAN_2:Hotpep 0.9992 0.0030 0.9981 1.0003 Bacteria PL
dbCAN_3 0.9996 0.0021 0.9988 1.0004 Bacteria PL
dbCAN_3:HMMER 0.9996 0.0021 0.9988 1.0004 Bacteria PL
dbCAN_3:DIAMOND 0.9993 0.0028 0.9982 1.0003 Bacteria PL
dbCAN_3:eCAMI 0.9997 0.0017 0.9991 1.0003 Bacteria PL
dbCAN_4 0.9996 0.0021 0.9988 1.0004 Bacteria PL
dbCAN_4:HMMER 0.9996 0.0021 0.9988 1.0004 Bacteria PL
dbCAN_4:DIAMOND 0.9993 0.0028 0.9982 1.0003 Bacteria PL
dbCAN_4:dbCAN-sub 0.9996 0.0021 0.9988 1.0004 Bacteria PL
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria PL
dbCAN_2 0.9995 0.0021 0.9984 1.0006 Eukaryote PL
dbCAN_2:HMMER 0.9995 0.0021 0.9984 1.0006 Eukaryote PL
dbCAN_2:DIAMOND 0.9995 0.0021 0.9984 1.0006 Eukaryote PL
dbCAN_2:Hotpep 0.9995 0.0021 0.9984 1.0006 Eukaryote PL
dbCAN_3 0.9989 0.0031 0.9973 1.0005 Eukaryote PL
dbCAN_3:HMMER 0.9989 0.0031 0.9973 1.0005 Eukaryote PL
dbCAN_3:DIAMOND 0.9982 0.0054 0.9955 1.0010 Eukaryote PL
dbCAN_3:eCAMI 0.9995 0.0021 0.9984 1.0006 Eukaryote PL
dbCAN_4 0.9989 0.0030 0.9974 1.0005 Eukaryote PL
dbCAN_4:HMMER 0.9989 0.0031 0.9973 1.0005 Eukaryote PL
dbCAN_4:DIAMOND 0.9982 0.0053 0.9955 1.0010 Eukaryote PL
dbCAN_4:dbCAN-sub 0.9989 0.0030 0.9974 1.0005 Eukaryote PL
CUPP 0.9989 0.0031 0.9973 1.0005 Eukaryote PL
dbCAN_2 0.9937 0.0220 0.9887 0.9988 All CE
dbCAN_2:HMMER 0.9945 0.0187 0.9902 0.9988 All CE
dbCAN_2:DIAMOND 0.9941 0.0210 0.9893 0.9989 All CE
dbCAN_2:Hotpep 0.9905 0.0229 0.9852 0.9957 All CE
dbCAN_3 0.9936 0.0224 0.9885 0.9988 All CE
dbCAN_3:HMMER 0.9953 0.0179 0.9912 0.9994 All CE
dbCAN_3:DIAMOND 0.9925 0.0234 0.9872 0.9979 All CE
dbCAN_3:eCAMI 0.9926 0.0213 0.9877 0.9975 All CE
dbCAN_4 0.9953 0.0179 0.9912 0.9994 All CE
dbCAN_4:HMMER 0.9953 0.0179 0.9912 0.9994 All CE
dbCAN_4:DIAMOND 0.9931 0.0227 0.9879 0.9983 All CE
dbCAN_4:dbCAN-sub 0.9952 0.0181 0.9911 0.9993 All CE
CUPP 0.9955 0.0178 0.9914 0.9995 All CE
dbCAN_2 0.9886 0.0294 0.9792 0.9980 Bacteria CE
dbCAN_2:HMMER 0.9906 0.0250 0.9826 0.9986 Bacteria CE
dbCAN_2:DIAMOND 0.9891 0.0281 0.9801 0.9981 Bacteria CE
dbCAN_2:Hotpep 0.9829 0.0295 0.9735 0.9924 Bacteria CE
dbCAN_3 0.9884 0.0299 0.9789 0.9980 Bacteria CE
dbCAN_3:HMMER 0.9919 0.0241 0.9842 0.9996 Bacteria CE
dbCAN_3:DIAMOND 0.9894 0.0285 0.9803 0.9985 Bacteria CE
dbCAN_3:eCAMI 0.9862 0.0279 0.9773 0.9951 Bacteria CE
dbCAN_4 0.9919 0.0241 0.9842 0.9996 Bacteria CE
dbCAN_4:HMMER 0.9919 0.0241 0.9842 0.9996 Bacteria CE
dbCAN_4:DIAMOND 0.9904 0.0276 0.9816 0.9993 Bacteria CE
dbCAN_4:dbCAN-sub 0.9919 0.0241 0.9842 0.9996 Bacteria CE
CUPP 0.9919 0.0239 0.9843 0.9995 Bacteria CE
dbCAN_2 0.9994 0.0035 0.9982 1.0006 Eukaryote CE
dbCAN_2:HMMER 0.9988 0.0042 0.9974 1.0003 Eukaryote CE
dbCAN_2:DIAMOND 0.9997 0.0018 0.9991 1.0003 Eukaryote CE
dbCAN_2:Hotpep 0.9989 0.0042 0.9974 1.0003 Eukaryote CE
dbCAN_3 0.9994 0.0035 0.9982 1.0006 Eukaryote CE
dbCAN_3:HMMER 0.9991 0.0039 0.9978 1.0004 Eukaryote CE
dbCAN_3:DIAMOND 0.9960 0.0156 0.9907 1.0013 Eukaryote CE
dbCAN_3:eCAMI 0.9997 0.0018 0.9991 1.0003 Eukaryote CE
dbCAN_4 0.9991 0.0038 0.9978 1.0004 Eukaryote CE
dbCAN_4:HMMER 0.9991 0.0039 0.9978 1.0004 Eukaryote CE
dbCAN_4:DIAMOND 0.9960 0.0155 0.9908 1.0013 Eukaryote CE
dbCAN_4:dbCAN-sub 0.9989 0.0054 0.9970 1.0007 Eukaryote CE
CUPP 0.9994 0.0035 0.9982 1.0006 Eukaryote CE
dbCAN_2 0.9919 0.0178 0.9868 0.9971 All AA
dbCAN_2:HMMER 0.9913 0.0187 0.9859 0.9967 All AA
dbCAN_2:DIAMOND 0.9922 0.0178 0.9870 0.9974 All AA
dbCAN_2:Hotpep 0.9920 0.0180 0.9867 0.9972 All AA
dbCAN_3 0.9911 0.0199 0.9853 0.9969 All AA
dbCAN_3:HMMER 0.9907 0.0202 0.9848 0.9965 All AA
dbCAN_3:DIAMOND 0.9900 0.0219 0.9837 0.9964 All AA
dbCAN_3:eCAMI 0.9921 0.0167 0.9873 0.9970 All AA
dbCAN_4 0.9907 0.0201 0.9848 0.9965 All AA
dbCAN_4:HMMER 0.9907 0.0202 0.9848 0.9965 All AA
dbCAN_4:DIAMOND 0.9903 0.0219 0.9839 0.9966 All AA
dbCAN_4:dbCAN-sub 0.9905 0.0202 0.9846 0.9963 All AA
CUPP 0.9919 0.0175 0.9868 0.9970 All AA
dbCAN_2 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:Hotpep 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:eCAMI 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:dbCAN-sub 1.0000 0.0000 1.0000 1.0000 Bacteria AA
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2 0.9896 0.0197 0.9830 0.9961 Eukaryote AA
dbCAN_2:HMMER 0.9887 0.0206 0.9818 0.9956 Eukaryote AA
dbCAN_2:DIAMOND 0.9899 0.0198 0.9833 0.9965 Eukaryote AA
dbCAN_2:Hotpep 0.9896 0.0200 0.9829 0.9962 Eukaryote AA
dbCAN_3 0.9884 0.0220 0.9811 0.9958 Eukaryote AA
dbCAN_3:HMMER 0.9879 0.0223 0.9804 0.9953 Eukaryote AA
dbCAN_3:DIAMOND 0.9871 0.0242 0.9790 0.9951 Eukaryote AA
dbCAN_3:eCAMI 0.9898 0.0184 0.9836 0.9959 Eukaryote AA
dbCAN_4 0.9879 0.0223 0.9805 0.9953 Eukaryote AA
dbCAN_4:HMMER 0.9879 0.0223 0.9804 0.9953 Eukaryote AA
dbCAN_4:DIAMOND 0.9874 0.0242 0.9793 0.9954 Eukaryote AA
dbCAN_4:dbCAN-sub 0.9876 0.0223 0.9802 0.9951 Eukaryote AA
CUPP 0.9895 0.0193 0.9831 0.9960 Eukaryote AA
dbCAN_2 0.9922 0.0119 0.9896 0.9948 All CBM
dbCAN_2:HMMER 0.9961 0.0088 0.9941 0.9980 All CBM
dbCAN_2:DIAMOND 0.9914 0.0149 0.9881 0.9947 All CBM
dbCAN_2:Hotpep 0.9014 0.0511 0.8901 0.9128 All CBM
dbCAN_3 0.9943 0.0109 0.9919 0.9968 All CBM
dbCAN_3:HMMER 0.9960 0.0088 0.9941 0.9980 All CBM
dbCAN_3:DIAMOND 0.9931 0.0144 0.9899 0.9963 All CBM
dbCAN_3:eCAMI 0.9470 0.0523 0.9354 0.9587 All CBM
dbCAN_4 0.9951 0.0103 0.9928 0.9974 All CBM
dbCAN_4:HMMER 0.9958 0.0097 0.9937 0.9980 All CBM
dbCAN_4:DIAMOND 0.9938 0.0146 0.9906 0.9970 All CBM
dbCAN_4:dbCAN-sub 0.9927 0.0129 0.9898 0.9955 All CBM
CUPP 1.0000 0.0000 1.0000 1.0000 All CBM
dbCAN_2 0.9947 0.0091 0.9918 0.9976 Bacteria CBM
dbCAN_2:HMMER 0.9951 0.0106 0.9917 0.9985 Bacteria CBM
dbCAN_2:DIAMOND 0.9942 0.0146 0.9895 0.9988 Bacteria CBM
dbCAN_2:Hotpep 0.8917 0.0580 0.8732 0.9103 Bacteria CBM
dbCAN_3 0.9947 0.0110 0.9912 0.9982 Bacteria CBM
dbCAN_3:HMMER 0.9951 0.0106 0.9917 0.9985 Bacteria CBM
dbCAN_3:DIAMOND 0.9933 0.0164 0.9880 0.9985 Bacteria CBM
dbCAN_3:eCAMI 0.9349 0.0588 0.9161 0.9537 Bacteria CBM
dbCAN_4 0.9953 0.0106 0.9920 0.9987 Bacteria CBM
dbCAN_4:HMMER 0.9951 0.0106 0.9917 0.9985 Bacteria CBM
dbCAN_4:DIAMOND 0.9937 0.0154 0.9887 0.9986 Bacteria CBM
dbCAN_4:dbCAN-sub 0.9939 0.0113 0.9903 0.9975 Bacteria CBM
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria CBM
dbCAN_2 0.9897 0.0138 0.9853 0.9941 Eukaryote CBM
dbCAN_2:HMMER 0.9970 0.0066 0.9949 0.9991 Eukaryote CBM
dbCAN_2:DIAMOND 0.9886 0.0149 0.9839 0.9934 Eukaryote CBM
dbCAN_2:Hotpep 0.9112 0.0417 0.8978 0.9245 Eukaryote CBM
dbCAN_3 0.9940 0.0109 0.9905 0.9975 Eukaryote CBM
dbCAN_3:HMMER 0.9970 0.0066 0.9949 0.9991 Eukaryote CBM
dbCAN_3:DIAMOND 0.9929 0.0123 0.9889 0.9968 Eukaryote CBM
dbCAN_3:eCAMI 0.9591 0.0424 0.9456 0.9727 Eukaryote CBM
dbCAN_4 0.9949 0.0102 0.9916 0.9982 Eukaryote CBM
dbCAN_4:HMMER 0.9966 0.0087 0.9938 0.9993 Eukaryote CBM
dbCAN_4:DIAMOND 0.9939 0.0139 0.9895 0.9984 Eukaryote CBM
dbCAN_4:dbCAN-sub 0.9914 0.0144 0.9868 0.9961 Eukaryote CBM
CUPP 1.0000 0.0000 1.0000 1.0000 Eukaryote CBM

8.2.2 Sensitivity

Table 8.7: Overall performance (represented by the Sensitivity) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation LowerCI UpperCI Tax_group CAZy_class
dbCAN_2 0.9354 0.0907 0.9152 0.9556 All GH
dbCAN_2:HMMER 0.9080 0.0824 0.8896 0.9263 All GH
dbCAN_2:DIAMOND 0.9379 0.1064 0.9142 0.9615 All GH
dbCAN_2:Hotpep 0.8641 0.1181 0.8378 0.8904 All GH
dbCAN_3 0.9567 0.0771 0.9396 0.9739 All GH
dbCAN_3:HMMER 0.9198 0.0830 0.9013 0.9383 All GH
dbCAN_3:DIAMOND 0.9760 0.0673 0.9610 0.9910 All GH
dbCAN_3:eCAMI 0.8764 0.1097 0.8520 0.9008 All GH
dbCAN_4 0.9500 0.0802 0.9322 0.9679 All GH
dbCAN_4:HMMER 0.9202 0.0828 0.9017 0.9386 All GH
dbCAN_4:DIAMOND 0.9737 0.0763 0.9567 0.9906 All GH
dbCAN_4:dbCAN-sub 0.9473 0.0789 0.9297 0.9648 All GH
CUPP 0.9080 0.0675 0.8930 0.9230 All GH
dbCAN_2 0.9364 0.1023 0.9037 0.9691 Bacteria GH
dbCAN_2:HMMER 0.9117 0.1098 0.8766 0.9469 Bacteria GH
dbCAN_2:DIAMOND 0.9342 0.1159 0.8972 0.9713 Bacteria GH
dbCAN_2:Hotpep 0.8977 0.1006 0.8655 0.9298 Bacteria GH
dbCAN_3 0.9597 0.1006 0.9275 0.9919 Bacteria GH
dbCAN_3:HMMER 0.9166 0.1118 0.8808 0.9524 Bacteria GH
dbCAN_3:DIAMOND 0.9649 0.0900 0.9361 0.9936 Bacteria GH
dbCAN_3:eCAMI 0.9142 0.1063 0.8802 0.9482 Bacteria GH
dbCAN_4 0.9534 0.1089 0.9186 0.9882 Bacteria GH
dbCAN_4:HMMER 0.9173 0.1116 0.8816 0.9530 Bacteria GH
dbCAN_4:DIAMOND 0.9588 0.1047 0.9253 0.9923 Bacteria GH
dbCAN_4:dbCAN-sub 0.9549 0.1056 0.9211 0.9887 Bacteria GH
CUPP 0.9170 0.0792 0.8917 0.9423 Bacteria GH
dbCAN_2 0.9343 0.0788 0.9091 0.9595 Eukaryote GH
dbCAN_2:HMMER 0.9042 0.0411 0.8911 0.9173 Eukaryote GH
dbCAN_2:DIAMOND 0.9415 0.0973 0.9104 0.9726 Eukaryote GH
dbCAN_2:Hotpep 0.8305 0.1257 0.7903 0.8707 Eukaryote GH
dbCAN_3 0.9538 0.0435 0.9398 0.9677 Eukaryote GH
dbCAN_3:HMMER 0.9230 0.0375 0.9110 0.9350 Eukaryote GH
dbCAN_3:DIAMOND 0.9872 0.0290 0.9779 0.9964 Eukaryote GH
dbCAN_3:eCAMI 0.8385 0.1007 0.8063 0.8708 Eukaryote GH
dbCAN_4 0.9467 0.0341 0.9358 0.9576 Eukaryote GH
dbCAN_4:HMMER 0.9230 0.0375 0.9110 0.9350 Eukaryote GH
dbCAN_4:DIAMOND 0.9885 0.0193 0.9823 0.9947 Eukaryote GH
dbCAN_4:dbCAN-sub 0.9396 0.0363 0.9280 0.9512 Eukaryote GH
CUPP 0.8990 0.0527 0.8821 0.9159 Eukaryote GH
dbCAN_2 0.8845 0.1378 0.8538 0.9152 All GT
dbCAN_2:HMMER 0.8627 0.1126 0.8376 0.8877 All GT
dbCAN_2:DIAMOND 0.9255 0.1508 0.8919 0.9591 All GT
dbCAN_2:Hotpep 0.7254 0.1807 0.6852 0.7656 All GT
dbCAN_3 0.9421 0.0971 0.9205 0.9637 All GT
dbCAN_3:HMMER 0.8654 0.1113 0.8406 0.8901 All GT
dbCAN_3:DIAMOND 0.9774 0.0897 0.9574 0.9973 All GT
dbCAN_3:eCAMI 0.8500 0.1524 0.8161 0.8839 All GT
dbCAN_4 0.9578 0.0921 0.9373 0.9783 All GT
dbCAN_4:HMMER 0.8657 0.1110 0.8410 0.8904 All GT
dbCAN_4:DIAMOND 0.9751 0.0914 0.9548 0.9954 All GT
dbCAN_4:dbCAN-sub 0.9538 0.0835 0.9352 0.9724 All GT
CUPP 0.8536 0.1107 0.8289 0.8782 All GT
dbCAN_2 0.8819 0.1395 0.8373 0.9265 Bacteria GT
dbCAN_2:HMMER 0.8491 0.1312 0.8071 0.8910 Bacteria GT
dbCAN_2:DIAMOND 0.9240 0.1583 0.8733 0.9746 Bacteria GT
dbCAN_2:Hotpep 0.6785 0.1870 0.6187 0.7383 Bacteria GT
dbCAN_3 0.9324 0.1249 0.8924 0.9723 Bacteria GT
dbCAN_3:HMMER 0.8482 0.1301 0.8066 0.8898 Bacteria GT
dbCAN_3:DIAMOND 0.9658 0.1219 0.9268 1.0048 Bacteria GT
dbCAN_3:eCAMI 0.8549 0.1621 0.8030 0.9067 Bacteria GT
dbCAN_4 0.9448 0.1259 0.9046 0.9851 Bacteria GT
dbCAN_4:HMMER 0.8493 0.1301 0.8077 0.8909 Bacteria GT
dbCAN_4:DIAMOND 0.9546 0.1262 0.9142 0.9949 Bacteria GT
dbCAN_4:dbCAN-sub 0.9445 0.1117 0.9088 0.9802 Bacteria GT
CUPP 0.8754 0.1137 0.8391 0.9118 Bacteria GT
dbCAN_2 0.8871 0.1379 0.8430 0.9312 Eukaryote GT
dbCAN_2:HMMER 0.8762 0.0901 0.8474 0.9051 Eukaryote GT
dbCAN_2:DIAMOND 0.9271 0.1449 0.8807 0.9734 Eukaryote GT
dbCAN_2:Hotpep 0.7723 0.1633 0.7201 0.8246 Eukaryote GT
dbCAN_3 0.9519 0.0576 0.9334 0.9703 Eukaryote GT
dbCAN_3:HMMER 0.8825 0.0870 0.8547 0.9103 Eukaryote GT
dbCAN_3:DIAMOND 0.9890 0.0339 0.9782 0.9998 Eukaryote GT
dbCAN_3:eCAMI 0.8451 0.1440 0.7991 0.8912 Eukaryote GT
dbCAN_4 0.9708 0.0317 0.9607 0.9809 Eukaryote GT
dbCAN_4:HMMER 0.8821 0.0866 0.8544 0.9098 Eukaryote GT
dbCAN_4:DIAMOND 0.9956 0.0118 0.9919 0.9994 Eukaryote GT
dbCAN_4:dbCAN-sub 0.9631 0.0384 0.9508 0.9754 Eukaryote GT
CUPP 0.8317 0.1045 0.7983 0.8651 Eukaryote GT
dbCAN_2 0.8797 0.2421 0.8086 0.9508 All PL
dbCAN_2:HMMER 0.8975 0.2125 0.8351 0.9598 All PL
dbCAN_2:DIAMOND 0.8691 0.2687 0.7902 0.9480 All PL
dbCAN_2:Hotpep 0.8407 0.2581 0.7650 0.9165 All PL
dbCAN_3 0.9881 0.0732 0.9666 1.0096 All PL
dbCAN_3:HMMER 0.9739 0.0982 0.9451 1.0028 All PL
dbCAN_3:DIAMOND 0.9881 0.0732 0.9666 1.0096 All PL
dbCAN_3:eCAMI 0.7960 0.2776 0.7154 0.8766 All PL
dbCAN_4 0.9739 0.0982 0.9451 1.0028 All PL
dbCAN_4:HMMER 0.9739 0.0982 0.9451 1.0028 All PL
dbCAN_4:DIAMOND 0.9987 0.0086 0.9962 1.0013 All PL
dbCAN_4:dbCAN-sub 0.9752 0.0982 0.9464 1.0040 All PL
CUPP 0.8511 0.2593 0.7749 0.9272 All PL
dbCAN_2 0.9099 0.2101 0.8314 0.9884 Bacteria PL
dbCAN_2:HMMER 0.9210 0.1641 0.8597 0.9823 Bacteria PL
dbCAN_2:DIAMOND 0.9266 0.1958 0.8534 0.9997 Bacteria PL
dbCAN_2:Hotpep 0.8738 0.1959 0.8007 0.9470 Bacteria PL
dbCAN_3 0.9814 0.0915 0.9472 1.0156 Bacteria PL
dbCAN_3:HMMER 0.9592 0.1211 0.9139 1.0044 Bacteria PL
dbCAN_3:DIAMOND 0.9980 0.0107 0.9940 1.0020 Bacteria PL
dbCAN_3:eCAMI 0.8181 0.2466 0.7276 0.9085 Bacteria PL
dbCAN_4 0.9592 0.1211 0.9139 1.0044 Bacteria PL
dbCAN_4:HMMER 0.9592 0.1211 0.9139 1.0044 Bacteria PL
dbCAN_4:DIAMOND 0.9980 0.0107 0.9940 1.0020 Bacteria PL
dbCAN_4:dbCAN-sub 0.9611 0.1213 0.9158 1.0064 Bacteria PL
CUPP 0.8484 0.2520 0.7542 0.9425 Bacteria PL
dbCAN_2 0.8265 0.2895 0.6776 0.9753 Eukaryote PL
dbCAN_2:HMMER 0.8559 0.2794 0.7122 0.9996 Eukaryote PL
dbCAN_2:DIAMOND 0.7676 0.3477 0.5889 0.9464 Eukaryote PL
dbCAN_2:Hotpep 0.7824 0.3409 0.6071 0.9576 Eukaryote PL
dbCAN_3 1.0000 0.0000 1.0000 1.0000 Eukaryote PL
dbCAN_3:HMMER 1.0000 0.0000 1.0000 1.0000 Eukaryote PL
dbCAN_3:DIAMOND 0.9706 0.1213 0.9082 1.0329 Eukaryote PL
dbCAN_3:eCAMI 0.7559 0.3311 0.5856 0.9261 Eukaryote PL
dbCAN_4 1.0000 0.0000 1.0000 1.0000 Eukaryote PL
dbCAN_4:HMMER 1.0000 0.0000 1.0000 1.0000 Eukaryote PL
dbCAN_4:DIAMOND 1.0000 0.0000 1.0000 1.0000 Eukaryote PL
dbCAN_4:dbCAN-sub 1.0000 0.0000 1.0000 1.0000 Eukaryote PL
CUPP 0.8559 0.2794 0.7122 0.9996 Eukaryote PL
dbCAN_2 0.9213 0.1433 0.8886 0.9540 All CE
dbCAN_2:HMMER 0.9208 0.1363 0.8896 0.9519 All CE
dbCAN_2:DIAMOND 0.8480 0.2469 0.7915 0.9044 All CE
dbCAN_2:Hotpep 0.8508 0.2065 0.8036 0.8980 All CE
dbCAN_3 0.9283 0.1537 0.8932 0.9634 All CE
dbCAN_3:HMMER 0.9230 0.1335 0.8925 0.9535 All CE
dbCAN_3:DIAMOND 0.9303 0.1704 0.8913 0.9692 All CE
dbCAN_3:eCAMI 0.8073 0.2445 0.7514 0.8632 All CE
dbCAN_4 0.9783 0.0669 0.9630 0.9935 All CE
dbCAN_4:HMMER 0.9529 0.1012 0.9298 0.9760 All CE
dbCAN_4:DIAMOND 0.9755 0.0851 0.9560 0.9949 All CE
dbCAN_4:dbCAN-sub 0.9755 0.0680 0.9599 0.9910 All CE
CUPP 0.9114 0.1332 0.8810 0.9419 All CE
dbCAN_2 0.9296 0.1184 0.8917 0.9675 Bacteria CE
dbCAN_2:HMMER 0.8801 0.1503 0.8320 0.9282 Bacteria CE
dbCAN_2:DIAMOND 0.8797 0.1688 0.8257 0.9337 Bacteria CE
dbCAN_2:Hotpep 0.9226 0.1251 0.8826 0.9627 Bacteria CE
dbCAN_3 0.9210 0.1569 0.8708 0.9712 Bacteria CE
dbCAN_3:HMMER 0.8801 0.1503 0.8320 0.9282 Bacteria CE
dbCAN_3:DIAMOND 0.9342 0.1346 0.8911 0.9772 Bacteria CE
dbCAN_3:eCAMI 0.8639 0.1908 0.8029 0.9250 Bacteria CE
dbCAN_4 0.9670 0.0836 0.9403 0.9937 Bacteria CE
dbCAN_4:HMMER 0.9245 0.1269 0.8839 0.9650 Bacteria CE
dbCAN_4:DIAMOND 0.9618 0.1104 0.9265 0.9971 Bacteria CE
dbCAN_4:dbCAN-sub 0.9634 0.0852 0.9362 0.9907 Bacteria CE
CUPP 0.8983 0.1322 0.8560 0.9405 Bacteria CE
dbCAN_2 0.9120 0.1679 0.8552 0.9689 Eukaryote CE
dbCAN_2:HMMER 0.9660 0.1032 0.9311 1.0009 Eukaryote CE
dbCAN_2:DIAMOND 0.8127 0.3106 0.7076 0.9178 Eukaryote CE
dbCAN_2:Hotpep 0.7710 0.2479 0.6871 0.8549 Eukaryote CE
dbCAN_3 0.9363 0.1518 0.8850 0.9877 Eukaryote CE
dbCAN_3:HMMER 0.9706 0.0926 0.9393 1.0019 Eukaryote CE
dbCAN_3:DIAMOND 0.9259 0.2049 0.8566 0.9952 Eukaryote CE
dbCAN_3:eCAMI 0.7444 0.2825 0.6488 0.8400 Eukaryote CE
dbCAN_4 0.9907 0.0387 0.9776 1.0038 Eukaryote CE
dbCAN_4:HMMER 0.9845 0.0455 0.9691 0.9999 Eukaryote CE
dbCAN_4:DIAMOND 0.9907 0.0387 0.9776 1.0038 Eukaryote CE
dbCAN_4:dbCAN-sub 0.9888 0.0383 0.9759 1.0018 Eukaryote CE
CUPP 0.9260 0.1346 0.8805 0.9716 Eukaryote CE
dbCAN_2 0.9190 0.1152 0.8855 0.9524 All AA
dbCAN_2:HMMER 0.9375 0.0917 0.9109 0.9641 All AA
dbCAN_2:DIAMOND 0.8693 0.1786 0.8174 0.9212 All AA
dbCAN_2:Hotpep 0.8737 0.1942 0.8173 0.9301 All AA
dbCAN_3 0.9835 0.0450 0.9705 0.9966 All AA
dbCAN_3:HMMER 0.9880 0.0393 0.9766 0.9994 All AA
dbCAN_3:DIAMOND 0.9884 0.0359 0.9779 0.9988 All AA
dbCAN_3:eCAMI 0.7821 0.2611 0.7063 0.8579 All AA
dbCAN_4 0.9937 0.0258 0.9862 1.0012 All AA
dbCAN_4:HMMER 0.9880 0.0393 0.9766 0.9994 All AA
dbCAN_4:DIAMOND 0.9700 0.0952 0.9423 0.9976 All AA
dbCAN_4:dbCAN-sub 0.9892 0.0331 0.9796 0.9988 All AA
CUPP 0.9047 0.1300 0.8670 0.9425 All AA
dbCAN_2 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:Hotpep 0.9091 0.3015 0.7065 1.1116 Bacteria AA
dbCAN_3 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:eCAMI 0.9091 0.3015 0.7065 1.1116 Bacteria AA
dbCAN_4 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:DIAMOND 0.9545 0.1508 0.8533 1.0558 Bacteria AA
dbCAN_4:dbCAN-sub 1.0000 0.0000 1.0000 1.0000 Bacteria AA
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2 0.8949 0.1214 0.8544 0.9354 Eukaryote AA
dbCAN_2:HMMER 0.9189 0.0971 0.8865 0.9513 Eukaryote AA
dbCAN_2:DIAMOND 0.8305 0.1868 0.7682 0.8927 Eukaryote AA
dbCAN_2:Hotpep 0.8632 0.1533 0.8121 0.9143 Eukaryote AA
dbCAN_3 0.9786 0.0503 0.9619 0.9954 Eukaryote AA
dbCAN_3:HMMER 0.9844 0.0442 0.9697 0.9992 Eukaryote AA
dbCAN_3:DIAMOND 0.9849 0.0403 0.9715 0.9983 Eukaryote AA
dbCAN_3:eCAMI 0.7443 0.2395 0.6645 0.8242 Eukaryote AA
dbCAN_4 0.9918 0.0292 0.9821 1.0016 Eukaryote AA
dbCAN_4:HMMER 0.9844 0.0442 0.9697 0.9992 Eukaryote AA
dbCAN_4:DIAMOND 0.9746 0.0737 0.9500 0.9992 Eukaryote AA
dbCAN_4:dbCAN-sub 0.9860 0.0372 0.9736 0.9984 Eukaryote AA
CUPP 0.8764 0.1359 0.8311 0.9217 Eukaryote AA
dbCAN_2 0.7242 0.1925 0.6813 0.7670 All CBM
dbCAN_2:HMMER 0.4129 0.2165 0.3648 0.4611 All CBM
dbCAN_2:DIAMOND 0.7854 0.2191 0.7367 0.8342 All CBM
dbCAN_2:Hotpep 0.7061 0.2102 0.6594 0.7529 All CBM
dbCAN_3 0.7680 0.1926 0.7252 0.8109 All CBM
dbCAN_3:HMMER 0.4135 0.2148 0.3657 0.4613 All CBM
dbCAN_3:DIAMOND 0.8675 0.1731 0.8290 0.9060 All CBM
dbCAN_3:eCAMI 0.7346 0.2203 0.6856 0.7836 All CBM
dbCAN_4 0.7995 0.1921 0.7567 0.8422 All CBM
dbCAN_4:HMMER 0.4740 0.2360 0.4215 0.5266 All CBM
dbCAN_4:DIAMOND 0.8773 0.1695 0.8396 0.9151 All CBM
dbCAN_4:dbCAN-sub 0.8193 0.1995 0.7749 0.8637 All CBM
CUPP 0.0000 0.0000 0.0000 0.0000 All CBM
dbCAN_2 0.7317 0.2003 0.6676 0.7957 Bacteria CBM
dbCAN_2:HMMER 0.5107 0.2276 0.4379 0.5834 Bacteria CBM
dbCAN_2:DIAMOND 0.7590 0.2212 0.6883 0.8297 Bacteria CBM
dbCAN_2:Hotpep 0.7044 0.2472 0.6253 0.7835 Bacteria CBM
dbCAN_3 0.7630 0.2112 0.6954 0.8305 Bacteria CBM
dbCAN_3:HMMER 0.5092 0.2267 0.4367 0.5817 Bacteria CBM
dbCAN_3:DIAMOND 0.8023 0.1976 0.7391 0.8654 Bacteria CBM
dbCAN_3:eCAMI 0.7427 0.2415 0.6655 0.8199 Bacteria CBM
dbCAN_4 0.9025 0.1017 0.8699 0.9350 Bacteria CBM
dbCAN_4:HMMER 0.6086 0.2286 0.5355 0.6817 Bacteria CBM
dbCAN_4:DIAMOND 0.7955 0.1993 0.7318 0.8592 Bacteria CBM
dbCAN_4:dbCAN-sub 0.9441 0.0985 0.9126 0.9756 Bacteria CBM
CUPP 0.0000 0.0000 0.0000 0.0000 Bacteria CBM
dbCAN_2 0.7167 0.1866 0.6570 0.7763 Eukaryote CBM
dbCAN_2:HMMER 0.3152 0.1537 0.2661 0.3644 Eukaryote CBM
dbCAN_2:DIAMOND 0.8119 0.2166 0.7426 0.8811 Eukaryote CBM
dbCAN_2:Hotpep 0.7079 0.1684 0.6540 0.7617 Eukaryote CBM
dbCAN_3 0.7730 0.1746 0.7172 0.8289 Eukaryote CBM
dbCAN_3:HMMER 0.3177 0.1525 0.2690 0.3665 Eukaryote CBM
dbCAN_3:DIAMOND 0.9327 0.1137 0.8964 0.9691 Eukaryote CBM
dbCAN_3:eCAMI 0.7264 0.1996 0.6626 0.7903 Eukaryote CBM
dbCAN_4 0.6965 0.2066 0.6305 0.7626 Eukaryote CBM
dbCAN_4:HMMER 0.3395 0.1530 0.2906 0.3884 Eukaryote CBM
dbCAN_4:DIAMOND 0.9592 0.0692 0.9370 0.9813 Eukaryote CBM
dbCAN_4:dbCAN-sub 0.6945 0.1974 0.6314 0.7577 Eukaryote CBM
CUPP 0.0000 0.0000 0.0000 0.0000 Eukaryote CBM

8.2.3 Precision

Table 8.8: Overall performance (represented by the Precision) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation LowerCI UpperCI Tax_group CAZy_class
dbCAN_2 0.9886 0.0347 0.9809 0.9963 All GH
dbCAN_2:HMMER 0.9897 0.0339 0.9821 0.9972 All GH
dbCAN_2:DIAMOND 0.9845 0.0360 0.9765 0.9926 All GH
dbCAN_2:Hotpep 0.9796 0.0419 0.9702 0.9889 All GH
dbCAN_3 0.9897 0.0332 0.9823 0.9971 All GH
dbCAN_3:HMMER 0.9909 0.0330 0.9836 0.9983 All GH
dbCAN_3:DIAMOND 0.9830 0.0391 0.9743 0.9917 All GH
dbCAN_3:eCAMI 0.9829 0.0395 0.9741 0.9917 All GH
dbCAN_4 0.9898 0.0337 0.9823 0.9973 All GH
dbCAN_4:HMMER 0.9909 0.0330 0.9836 0.9983 All GH
dbCAN_4:DIAMOND 0.9842 0.0393 0.9754 0.9929 All GH
dbCAN_4:dbCAN-sub 0.9896 0.0337 0.9821 0.9971 All GH
CUPP 0.9906 0.0338 0.9831 0.9981 All GH
dbCAN_2 0.9908 0.0219 0.9838 0.9978 Bacteria GH
dbCAN_2:HMMER 0.9935 0.0153 0.9886 0.9984 Bacteria GH
dbCAN_2:DIAMOND 0.9841 0.0265 0.9757 0.9926 Bacteria GH
dbCAN_2:Hotpep 0.9736 0.0369 0.9618 0.9854 Bacteria GH
dbCAN_3 0.9917 0.0191 0.9856 0.9978 Bacteria GH
dbCAN_3:HMMER 0.9945 0.0142 0.9900 0.9991 Bacteria GH
dbCAN_3:DIAMOND 0.9858 0.0247 0.9779 0.9937 Bacteria GH
dbCAN_3:eCAMI 0.9805 0.0274 0.9718 0.9893 Bacteria GH
dbCAN_4 0.9923 0.0176 0.9867 0.9980 Bacteria GH
dbCAN_4:HMMER 0.9945 0.0142 0.9900 0.9991 Bacteria GH
dbCAN_4:DIAMOND 0.9886 0.0244 0.9808 0.9964 Bacteria GH
dbCAN_4:dbCAN-sub 0.9920 0.0175 0.9864 0.9976 Bacteria GH
CUPP 0.9931 0.0143 0.9886 0.9977 Bacteria GH
dbCAN_2 0.9863 0.0442 0.9722 1.0005 Eukaryote GH
dbCAN_2:HMMER 0.9859 0.0454 0.9714 1.0004 Eukaryote GH
dbCAN_2:DIAMOND 0.9850 0.0439 0.9709 0.9990 Eukaryote GH
dbCAN_2:Hotpep 0.9855 0.0461 0.9708 1.0003 Eukaryote GH
dbCAN_3 0.9877 0.0432 0.9739 1.0015 Eukaryote GH
dbCAN_3:HMMER 0.9873 0.0444 0.9731 1.0015 Eukaryote GH
dbCAN_3:DIAMOND 0.9801 0.0497 0.9642 0.9960 Eukaryote GH
dbCAN_3:eCAMI 0.9852 0.0489 0.9695 1.0008 Eukaryote GH
dbCAN_4 0.9873 0.0445 0.9730 1.0015 Eukaryote GH
dbCAN_4:HMMER 0.9873 0.0444 0.9731 1.0015 Eukaryote GH
dbCAN_4:DIAMOND 0.9797 0.0499 0.9638 0.9957 Eukaryote GH
dbCAN_4:dbCAN-sub 0.9872 0.0445 0.9730 1.0015 Eukaryote GH
CUPP 0.9880 0.0458 0.9734 1.0027 Eukaryote GH
dbCAN_2 0.9898 0.0542 0.9778 1.0019 All GT
dbCAN_2:HMMER 0.9884 0.0566 0.9758 1.0010 All GT
dbCAN_2:DIAMOND 0.9886 0.0567 0.9760 1.0013 All GT
dbCAN_2:Hotpep 0.9836 0.0688 0.9683 0.9989 All GT
dbCAN_3 0.9891 0.0563 0.9766 1.0016 All GT
dbCAN_3:HMMER 0.9884 0.0566 0.9758 1.0010 All GT
dbCAN_3:DIAMOND 0.9839 0.0620 0.9701 0.9977 All GT
dbCAN_3:eCAMI 0.9881 0.0572 0.9754 1.0009 All GT
dbCAN_4 0.9866 0.0566 0.9740 0.9992 All GT
dbCAN_4:HMMER 0.9884 0.0566 0.9758 1.0010 All GT
dbCAN_4:DIAMOND 0.9841 0.0638 0.9699 0.9983 All GT
dbCAN_4:dbCAN-sub 0.9866 0.0566 0.9740 0.9992 All GT
CUPP 0.9883 0.0625 0.9744 1.0022 All GT
dbCAN_2 0.9987 0.0058 0.9969 1.0006 Bacteria GT
dbCAN_2:HMMER 0.9988 0.0055 0.9970 1.0005 Bacteria GT
dbCAN_2:DIAMOND 0.9984 0.0069 0.9962 1.0007 Bacteria GT
dbCAN_2:Hotpep 0.9908 0.0250 0.9828 0.9988 Bacteria GT
dbCAN_3 0.9988 0.0053 0.9971 1.0005 Bacteria GT
dbCAN_3:HMMER 0.9988 0.0054 0.9971 1.0005 Bacteria GT
dbCAN_3:DIAMOND 0.9975 0.0088 0.9947 1.0004 Bacteria GT
dbCAN_3:eCAMI 0.9963 0.0112 0.9928 0.9999 Bacteria GT
dbCAN_4 0.9980 0.0074 0.9957 1.0004 Bacteria GT
dbCAN_4:HMMER 0.9988 0.0054 0.9971 1.0005 Bacteria GT
dbCAN_4:DIAMOND 0.9975 0.0091 0.9945 1.0004 Bacteria GT
dbCAN_4:dbCAN-sub 0.9980 0.0074 0.9957 1.0004 Bacteria GT
CUPP 0.9995 0.0030 0.9985 1.0005 Bacteria GT
dbCAN_2 0.9810 0.0759 0.9567 1.0052 Eukaryote GT
dbCAN_2:HMMER 0.9781 0.0790 0.9528 1.0033 Eukaryote GT
dbCAN_2:DIAMOND 0.9788 0.0792 0.9535 1.0042 Eukaryote GT
dbCAN_2:Hotpep 0.9764 0.0942 0.9463 1.0066 Eukaryote GT
dbCAN_3 0.9794 0.0787 0.9542 1.0045 Eukaryote GT
dbCAN_3:HMMER 0.9781 0.0790 0.9528 1.0033 Eukaryote GT
dbCAN_3:DIAMOND 0.9702 0.0856 0.9429 0.9976 Eukaryote GT
dbCAN_3:eCAMI 0.9799 0.0798 0.9544 1.0054 Eukaryote GT
dbCAN_4 0.9752 0.0785 0.9501 1.0003 Eukaryote GT
dbCAN_4:HMMER 0.9781 0.0790 0.9528 1.0033 Eukaryote GT
dbCAN_4:DIAMOND 0.9707 0.0883 0.9425 0.9990 Eukaryote GT
dbCAN_4:dbCAN-sub 0.9752 0.0785 0.9501 1.0003 Eukaryote GT
CUPP 0.9771 0.0874 0.9491 1.0050 Eukaryote GT
dbCAN_2 0.9532 0.2052 0.8929 1.0134 All PL
dbCAN_2:HMMER 0.9745 0.1481 0.9310 1.0180 All PL
dbCAN_2:DIAMOND 0.9248 0.2505 0.8513 0.9984 All PL
dbCAN_2:Hotpep 0.9506 0.2050 0.8904 1.0108 All PL
dbCAN_3 0.9846 0.0766 0.9621 1.0071 All PL
dbCAN_3:HMMER 0.9846 0.0766 0.9621 1.0071 All PL
dbCAN_3:DIAMOND 0.9730 0.1012 0.9433 1.0028 All PL
dbCAN_3:eCAMI 0.9333 0.2452 0.8621 1.0045 All PL
dbCAN_4 0.9846 0.0766 0.9621 1.0071 All PL
dbCAN_4:HMMER 0.9846 0.0766 0.9621 1.0071 All PL
dbCAN_4:DIAMOND 0.9730 0.1012 0.9433 1.0028 All PL
dbCAN_4:dbCAN-sub 0.9846 0.0766 0.9621 1.0071 All PL
CUPP 0.9496 0.2058 0.8892 1.0101 All PL
dbCAN_2 0.9667 0.1826 0.8985 1.0348 Bacteria PL
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria PL
dbCAN_2:DIAMOND 0.9556 0.1904 0.8844 1.0267 Bacteria PL
dbCAN_2:Hotpep 0.9960 0.0154 0.9902 1.0017 Bacteria PL
dbCAN_3 0.9980 0.0107 0.9940 1.0020 Bacteria PL
dbCAN_3:HMMER 0.9980 0.0107 0.9940 1.0020 Bacteria PL
dbCAN_3:DIAMOND 0.9869 0.0614 0.9640 1.0099 Bacteria PL
dbCAN_3:eCAMI 0.9677 0.1796 0.9019 1.0336 Bacteria PL
dbCAN_4 0.9980 0.0107 0.9940 1.0020 Bacteria PL
dbCAN_4:HMMER 0.9980 0.0107 0.9940 1.0020 Bacteria PL
dbCAN_4:DIAMOND 0.9869 0.0614 0.9640 1.0099 Bacteria PL
dbCAN_4:dbCAN-sub 0.9980 0.0107 0.9940 1.0020 Bacteria PL
CUPP 0.9667 0.1826 0.8985 1.0348 Bacteria PL
dbCAN_2 0.9294 0.2443 0.8038 1.0550 Eukaryote PL
dbCAN_2:HMMER 0.9294 0.2443 0.8038 1.0550 Eukaryote PL
dbCAN_2:DIAMOND 0.8706 0.3312 0.7003 1.0409 Eukaryote PL
dbCAN_2:Hotpep 0.8706 0.3312 0.7003 1.0409 Eukaryote PL
dbCAN_3 0.9608 0.1254 0.8963 1.0253 Eukaryote PL
dbCAN_3:HMMER 0.9608 0.1254 0.8963 1.0253 Eukaryote PL
dbCAN_3:DIAMOND 0.9485 0.1470 0.8730 1.0241 Eukaryote PL
dbCAN_3:eCAMI 0.8706 0.3312 0.7003 1.0409 Eukaryote PL
dbCAN_4 0.9608 0.1254 0.8963 1.0253 Eukaryote PL
dbCAN_4:HMMER 0.9608 0.1254 0.8963 1.0253 Eukaryote PL
dbCAN_4:DIAMOND 0.9485 0.1470 0.8730 1.0241 Eukaryote PL
dbCAN_4:dbCAN-sub 0.9608 0.1254 0.8963 1.0253 Eukaryote PL
CUPP 0.9196 0.2447 0.7938 1.0454 Eukaryote PL
dbCAN_2 0.9519 0.1408 0.9197 0.9840 All CE
dbCAN_2:HMMER 0.9525 0.1191 0.9253 0.9797 All CE
dbCAN_2:DIAMOND 0.9280 0.2089 0.8803 0.9757 All CE
dbCAN_2:Hotpep 0.9128 0.1671 0.8746 0.9510 All CE
dbCAN_3 0.9526 0.1391 0.9208 0.9844 All CE
dbCAN_3:HMMER 0.9602 0.1103 0.9350 0.9854 All CE
dbCAN_3:DIAMOND 0.9304 0.1864 0.8878 0.9730 All CE
dbCAN_3:eCAMI 0.9156 0.1909 0.8719 0.9592 All CE
dbCAN_4 0.9603 0.1103 0.9351 0.9855 All CE
dbCAN_4:HMMER 0.9603 0.1103 0.9351 0.9855 All CE
dbCAN_4:DIAMOND 0.9476 0.1506 0.9132 0.9820 All CE
dbCAN_4:dbCAN-sub 0.9592 0.1120 0.9336 0.9848 All CE
CUPP 0.9598 0.1130 0.9339 0.9856 All CE
dbCAN_2 0.9148 0.1833 0.8562 0.9734 Bacteria CE
dbCAN_2:HMMER 0.9218 0.1527 0.8729 0.9706 Bacteria CE
dbCAN_2:DIAMOND 0.9182 0.1867 0.8585 0.9779 Bacteria CE
dbCAN_2:Hotpep 0.8560 0.2018 0.7914 0.9205 Bacteria CE
dbCAN_3 0.9149 0.1821 0.8567 0.9732 Bacteria CE
dbCAN_3:HMMER 0.9322 0.1429 0.8865 0.9779 Bacteria CE
dbCAN_3:DIAMOND 0.9219 0.1752 0.8658 0.9779 Bacteria CE
dbCAN_3:eCAMI 0.8681 0.2000 0.8042 0.9321 Bacteria CE
dbCAN_4 0.9324 0.1429 0.8867 0.9781 Bacteria CE
dbCAN_4:HMMER 0.9324 0.1429 0.8867 0.9781 Bacteria CE
dbCAN_4:DIAMOND 0.9295 0.1739 0.8739 0.9851 Bacteria CE
dbCAN_4:dbCAN-sub 0.9321 0.1429 0.8864 0.9778 Bacteria CE
CUPP 0.9298 0.1452 0.8834 0.9762 Bacteria CE
dbCAN_2 0.9931 0.0417 0.9790 1.0072 Eukaryote CE
dbCAN_2:HMMER 0.9867 0.0459 0.9712 1.0023 Eukaryote CE
dbCAN_2:DIAMOND 0.9389 0.2333 0.8599 1.0178 Eukaryote CE
dbCAN_2:Hotpep 0.9759 0.0818 0.9482 1.0035 Eukaryote CE
dbCAN_3 0.9944 0.0333 0.9832 1.0057 Eukaryote CE
dbCAN_3:HMMER 0.9914 0.0377 0.9786 1.0041 Eukaryote CE
dbCAN_3:DIAMOND 0.9399 0.2002 0.8722 1.0076 Eukaryote CE
dbCAN_3:eCAMI 0.9683 0.1677 0.9115 1.0250 Eukaryote CE
dbCAN_4 0.9914 0.0377 0.9786 1.0041 Eukaryote CE
dbCAN_4:HMMER 0.9914 0.0377 0.9786 1.0041 Eukaryote CE
dbCAN_4:DIAMOND 0.9677 0.1189 0.9275 1.0079 Eukaryote CE
dbCAN_4:dbCAN-sub 0.9893 0.0486 0.9729 1.0058 Eukaryote CE
CUPP 0.9931 0.0417 0.9790 1.0072 Eukaryote CE
dbCAN_2 0.9208 0.1693 0.8716 0.9699 All AA
dbCAN_2:HMMER 0.9191 0.1607 0.8724 0.9658 All AA
dbCAN_2:DIAMOND 0.9228 0.1710 0.8731 0.9724 All AA
dbCAN_2:Hotpep 0.8983 0.2166 0.8354 0.9612 All AA
dbCAN_3 0.9236 0.1605 0.8770 0.9702 All AA
dbCAN_3:HMMER 0.9192 0.1624 0.8720 0.9664 All AA
dbCAN_3:DIAMOND 0.9171 0.1693 0.8679 0.9662 All AA
dbCAN_3:eCAMI 0.8514 0.2809 0.7699 0.9330 All AA
dbCAN_4 0.9198 0.1610 0.8731 0.9666 All AA
dbCAN_4:HMMER 0.9192 0.1624 0.8720 0.9664 All AA
dbCAN_4:DIAMOND 0.9196 0.1697 0.8703 0.9689 All AA
dbCAN_4:dbCAN-sub 0.9177 0.1620 0.8707 0.9648 All AA
CUPP 0.9184 0.1697 0.8691 0.9677 All AA
dbCAN_2 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:Hotpep 0.9091 0.3015 0.7065 1.1116 Bacteria AA
dbCAN_3 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:eCAMI 0.9091 0.3015 0.7065 1.1116 Bacteria AA
dbCAN_4 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:dbCAN-sub 1.0000 0.0000 1.0000 1.0000 Bacteria AA
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2 0.8972 0.1869 0.8349 0.9595 Eukaryote AA
dbCAN_2:HMMER 0.8950 0.1764 0.8362 0.9539 Eukaryote AA
dbCAN_2:DIAMOND 0.8998 0.1892 0.8367 0.9629 Eukaryote AA
dbCAN_2:Hotpep 0.8951 0.1896 0.8319 0.9583 Eukaryote AA
dbCAN_3 0.9009 0.1770 0.8419 0.9599 Eukaryote AA
dbCAN_3:HMMER 0.8952 0.1785 0.8357 0.9547 Eukaryote AA
dbCAN_3:DIAMOND 0.8925 0.1862 0.8304 0.9545 Eukaryote AA
dbCAN_3:eCAMI 0.8343 0.2765 0.7421 0.9264 Eukaryote AA
dbCAN_4 0.8960 0.1769 0.8370 0.9550 Eukaryote AA
dbCAN_4:HMMER 0.8952 0.1785 0.8357 0.9547 Eukaryote AA
dbCAN_4:DIAMOND 0.8957 0.1872 0.8333 0.9581 Eukaryote AA
dbCAN_4:dbCAN-sub 0.8932 0.1777 0.8340 0.9525 Eukaryote AA
CUPP 0.8942 0.1870 0.8318 0.9565 Eukaryote AA
dbCAN_2 0.9203 0.1333 0.8906 0.9500 All CBM
dbCAN_2:HMMER 0.8965 0.2304 0.8452 0.9478 All CBM
dbCAN_2:DIAMOND 0.9219 0.1502 0.8885 0.9553 All CBM
dbCAN_2:Hotpep 0.4696 0.1731 0.4311 0.5081 All CBM
dbCAN_3 0.9431 0.1355 0.9129 0.9732 All CBM
dbCAN_3:HMMER 0.8964 0.2304 0.8451 0.9476 All CBM
dbCAN_3:DIAMOND 0.9512 0.1283 0.9227 0.9797 All CBM
dbCAN_3:eCAMI 0.6615 0.2254 0.6113 0.7117 All CBM
dbCAN_4 0.9562 0.0919 0.9358 0.9767 All CBM
dbCAN_4:HMMER 0.9184 0.1999 0.8739 0.9629 All CBM
dbCAN_4:DIAMOND 0.9571 0.1273 0.9288 0.9854 All CBM
dbCAN_4:dbCAN-sub 0.9376 0.1151 0.9120 0.9632 All CBM
CUPP 0.0000 0.0000 0.0000 0.0000 All CBM
dbCAN_2 0.9517 0.1091 0.9168 0.9866 Bacteria CBM
dbCAN_2:HMMER 0.9380 0.1365 0.8944 0.9817 Bacteria CBM
dbCAN_2:DIAMOND 0.9551 0.1626 0.9031 1.0071 Bacteria CBM
dbCAN_2:Hotpep 0.4795 0.2000 0.4156 0.5435 Bacteria CBM
dbCAN_3 0.9384 0.1704 0.8839 0.9929 Bacteria CBM
dbCAN_3:HMMER 0.9377 0.1365 0.8941 0.9814 Bacteria CBM
dbCAN_3:DIAMOND 0.9600 0.1592 0.9091 1.0109 Bacteria CBM
dbCAN_3:eCAMI 0.6317 0.2308 0.5579 0.7055 Bacteria CBM
dbCAN_4 0.9699 0.0662 0.9487 0.9911 Bacteria CBM
dbCAN_4:HMMER 0.9524 0.1130 0.9163 0.9886 Bacteria CBM
dbCAN_4:DIAMOND 0.9604 0.1591 0.9095 1.0113 Bacteria CBM
dbCAN_4:dbCAN-sub 0.9624 0.0681 0.9406 0.9842 Bacteria CBM
CUPP 0.0000 0.0000 0.0000 0.0000 Bacteria CBM
dbCAN_2 0.8889 0.1486 0.8414 0.9364 Eukaryote CBM
dbCAN_2:HMMER 0.8550 0.2922 0.7616 0.9484 Eukaryote CBM
dbCAN_2:DIAMOND 0.8887 0.1303 0.8471 0.9304 Eukaryote CBM
dbCAN_2:Hotpep 0.4597 0.1431 0.4139 0.5055 Eukaryote CBM
dbCAN_3 0.9477 0.0901 0.9189 0.9765 Eukaryote CBM
dbCAN_3:HMMER 0.8550 0.2922 0.7616 0.9484 Eukaryote CBM
dbCAN_3:DIAMOND 0.9424 0.0885 0.9141 0.9707 Eukaryote CBM
dbCAN_3:eCAMI 0.6913 0.2188 0.6214 0.7613 Eukaryote CBM
dbCAN_4 0.9426 0.1111 0.9071 0.9781 Eukaryote CBM
dbCAN_4:HMMER 0.8844 0.2565 0.8023 0.9664 Eukaryote CBM
dbCAN_4:DIAMOND 0.9538 0.0865 0.9261 0.9815 Eukaryote CBM
dbCAN_4:dbCAN-sub 0.9128 0.1447 0.8665 0.9591 Eukaryote CBM
CUPP 0.0000 0.0000 0.0000 0.0000 Eukaryote CBM

8.2.4 F1-score

Table 8.9: Overall performance (represented by the F1-score) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation LowerCI UpperCI Tax_group CAZy_class
dbCAN_2 0.9583 0.0614 0.9446 0.9719 All GH
dbCAN_2:HMMER 0.9444 0.0587 0.9313 0.9574 All GH
dbCAN_2:DIAMOND 0.9563 0.0727 0.9401 0.9725 All GH
dbCAN_2:Hotpep 0.9136 0.0773 0.8964 0.9308 All GH
dbCAN_3 0.9708 0.0558 0.9583 0.9832 All GH
dbCAN_3:HMMER 0.9514 0.0596 0.9382 0.9647 All GH
dbCAN_3:DIAMOND 0.9776 0.0491 0.9666 0.9885 All GH
dbCAN_3:eCAMI 0.9223 0.0695 0.9069 0.9378 All GH
dbCAN_4 0.9671 0.0590 0.9539 0.9802 All GH
dbCAN_4:HMMER 0.9516 0.0596 0.9384 0.9649 All GH
dbCAN_4:DIAMOND 0.9764 0.0583 0.9634 0.9894 All GH
dbCAN_4:dbCAN-sub 0.9655 0.0581 0.9526 0.9785 All GH
CUPP 0.9461 0.0454 0.9360 0.9562 All GH
dbCAN_2 0.9592 0.0692 0.9371 0.9814 Bacteria GH
dbCAN_2:HMMER 0.9464 0.0769 0.9218 0.9710 Bacteria GH
dbCAN_2:DIAMOND 0.9535 0.0809 0.9277 0.9794 Bacteria GH
dbCAN_2:Hotpep 0.9308 0.0630 0.9106 0.9509 Bacteria GH
dbCAN_3 0.9718 0.0713 0.9490 0.9947 Bacteria GH
dbCAN_3:HMMER 0.9495 0.0782 0.9245 0.9745 Bacteria GH
dbCAN_3:DIAMOND 0.9723 0.0614 0.9527 0.9919 Bacteria GH
dbCAN_3:eCAMI 0.9423 0.0641 0.9218 0.9628 Bacteria GH
dbCAN_4 0.9682 0.0776 0.9434 0.9930 Bacteria GH
dbCAN_4:HMMER 0.9499 0.0782 0.9249 0.9749 Bacteria GH
dbCAN_4:DIAMOND 0.9694 0.0764 0.9449 0.9938 Bacteria GH
dbCAN_4:dbCAN-sub 0.9690 0.0761 0.9447 0.9933 Bacteria GH
CUPP 0.9516 0.0487 0.9360 0.9672 Bacteria GH
dbCAN_2 0.9573 0.0533 0.9403 0.9744 Eukaryote GH
dbCAN_2:HMMER 0.9423 0.0327 0.9319 0.9528 Eukaryote GH
dbCAN_2:DIAMOND 0.9590 0.0644 0.9384 0.9796 Eukaryote GH
dbCAN_2:Hotpep 0.8965 0.0868 0.8687 0.9242 Eukaryote GH
dbCAN_3 0.9697 0.0350 0.9585 0.9809 Eukaryote GH
dbCAN_3:HMMER 0.9534 0.0329 0.9429 0.9639 Eukaryote GH
dbCAN_3:DIAMOND 0.9828 0.0327 0.9724 0.9933 Eukaryote GH
dbCAN_3:eCAMI 0.9023 0.0698 0.8800 0.9247 Eukaryote GH
dbCAN_4 0.9659 0.0319 0.9557 0.9761 Eukaryote GH
dbCAN_4:HMMER 0.9534 0.0329 0.9429 0.9639 Eukaryote GH
dbCAN_4:DIAMOND 0.9834 0.0307 0.9736 0.9932 Eukaryote GH
dbCAN_4:dbCAN-sub 0.9621 0.0320 0.9519 0.9723 Eukaryote GH
CUPP 0.9405 0.0416 0.9272 0.9538 Eukaryote GH
dbCAN_2 0.9258 0.1016 0.9032 0.9484 All GT
dbCAN_2:HMMER 0.9152 0.0845 0.8964 0.9340 All GT
dbCAN_2:DIAMOND 0.9463 0.1124 0.9213 0.9713 All GT
dbCAN_2:Hotpep 0.8209 0.1383 0.7901 0.8517 All GT
dbCAN_3 0.9606 0.0790 0.9430 0.9782 All GT
dbCAN_3:HMMER 0.9169 0.0839 0.8982 0.9355 All GT
dbCAN_3:DIAMOND 0.9764 0.0755 0.9596 0.9932 All GT
dbCAN_3:eCAMI 0.9046 0.1076 0.8806 0.9285 All GT
dbCAN_4 0.9677 0.0753 0.9510 0.9845 All GT
dbCAN_4:HMMER 0.9170 0.0838 0.8984 0.9357 All GT
dbCAN_4:DIAMOND 0.9750 0.0781 0.9577 0.9924 All GT
dbCAN_4:dbCAN-sub 0.9664 0.0656 0.9518 0.9810 All GT
CUPP 0.9107 0.0758 0.8938 0.9276 All GT
dbCAN_2 0.9294 0.1020 0.8968 0.9620 Bacteria GT
dbCAN_2:HMMER 0.9110 0.1012 0.8786 0.9434 Bacteria GT
dbCAN_2:DIAMOND 0.9506 0.1147 0.9139 0.9873 Bacteria GT
dbCAN_2:Hotpep 0.7903 0.1418 0.7449 0.8357 Bacteria GT
dbCAN_3 0.9584 0.0970 0.9274 0.9894 Bacteria GT
dbCAN_3:HMMER 0.9106 0.1007 0.8784 0.9428 Bacteria GT
dbCAN_3:DIAMOND 0.9758 0.0925 0.9462 1.0054 Bacteria GT
dbCAN_3:eCAMI 0.9112 0.1075 0.8768 0.9455 Bacteria GT
dbCAN_4 0.9647 0.0958 0.9340 0.9953 Bacteria GT
dbCAN_4:HMMER 0.9112 0.1007 0.8790 0.9434 Bacteria GT
dbCAN_4:DIAMOND 0.9695 0.0962 0.9387 1.0003 Bacteria GT
dbCAN_4:dbCAN-sub 0.9660 0.0801 0.9404 0.9917 Bacteria GT
CUPP 0.9292 0.0713 0.9064 0.9520 Bacteria GT
dbCAN_2 0.9222 0.1023 0.8895 0.9550 Eukaryote GT
dbCAN_2:HMMER 0.9194 0.0648 0.8987 0.9401 Eukaryote GT
dbCAN_2:DIAMOND 0.9420 0.1112 0.9064 0.9776 Eukaryote GT
dbCAN_2:Hotpep 0.8515 0.1293 0.8101 0.8928 Eukaryote GT
dbCAN_3 0.9628 0.0569 0.9446 0.9810 Eukaryote GT
dbCAN_3:HMMER 0.9231 0.0636 0.9028 0.9435 Eukaryote GT
dbCAN_3:DIAMOND 0.9769 0.0547 0.9595 0.9944 Eukaryote GT
dbCAN_3:eCAMI 0.8980 0.1085 0.8633 0.9327 Eukaryote GT
dbCAN_4 0.9708 0.0479 0.9555 0.9861 Eukaryote GT
dbCAN_4:HMMER 0.9229 0.0634 0.9026 0.9432 Eukaryote GT
dbCAN_4:DIAMOND 0.9806 0.0551 0.9630 0.9982 Eukaryote GT
dbCAN_4:dbCAN-sub 0.9667 0.0481 0.9513 0.9821 Eukaryote GT
CUPP 0.8923 0.0766 0.8678 0.9168 Eukaryote GT
dbCAN_2 0.9073 0.2183 0.8432 0.9714 All PL
dbCAN_2:HMMER 0.9250 0.1788 0.8725 0.9775 All PL
dbCAN_2:DIAMOND 0.8889 0.2530 0.8146 0.9632 All PL
dbCAN_2:Hotpep 0.8803 0.2257 0.8140 0.9465 All PL
dbCAN_3 0.9826 0.0689 0.9624 1.0028 All PL
dbCAN_3:HMMER 0.9741 0.0781 0.9511 0.9970 All PL
dbCAN_3:DIAMOND 0.9754 0.0792 0.9521 0.9986 All PL
dbCAN_3:eCAMI 0.8473 0.2533 0.7737 0.9208 All PL
dbCAN_4 0.9741 0.0781 0.9511 0.9970 All PL
dbCAN_4:HMMER 0.9741 0.0781 0.9511 0.9970 All PL
dbCAN_4:DIAMOND 0.9825 0.0646 0.9635 1.0014 All PL
dbCAN_4:dbCAN-sub 0.9747 0.0782 0.9518 0.9977 All PL
CUPP 0.8850 0.2288 0.8178 0.9522 All PL
dbCAN_2 0.9329 0.1918 0.8612 1.0045 Bacteria PL
dbCAN_2:HMMER 0.9495 0.1139 0.9070 0.9921 Bacteria PL
dbCAN_2:DIAMOND 0.9373 0.1869 0.8675 1.0071 Bacteria PL
dbCAN_2:Hotpep 0.9175 0.1327 0.8679 0.9670 Bacteria PL
dbCAN_3 0.9869 0.0610 0.9641 1.0096 Bacteria PL
dbCAN_3:HMMER 0.9735 0.0770 0.9448 1.0023 Bacteria PL
dbCAN_3:DIAMOND 0.9913 0.0369 0.9775 1.0051 Bacteria PL
dbCAN_3:eCAMI 0.8734 0.2083 0.7970 0.9498 Bacteria PL
dbCAN_4 0.9735 0.0770 0.9448 1.0023 Bacteria PL
dbCAN_4:HMMER 0.9735 0.0770 0.9448 1.0023 Bacteria PL
dbCAN_4:DIAMOND 0.9913 0.0369 0.9775 1.0051 Bacteria PL
dbCAN_4:dbCAN-sub 0.9745 0.0771 0.9457 1.0034 Bacteria PL
CUPP 0.8899 0.2178 0.8085 0.9712 Bacteria PL
dbCAN_2 0.8622 0.2587 0.7292 0.9952 Eukaryote PL
dbCAN_2:HMMER 0.8818 0.2556 0.7504 1.0132 Eukaryote PL
dbCAN_2:DIAMOND 0.8034 0.3294 0.6340 0.9727 Eukaryote PL
dbCAN_2:Hotpep 0.8146 0.3277 0.6461 0.9831 Eukaryote PL
dbCAN_3 0.9750 0.0825 0.9326 1.0174 Eukaryote PL
dbCAN_3:HMMER 0.9750 0.0825 0.9326 1.0174 Eukaryote PL
dbCAN_3:DIAMOND 0.9472 0.1194 0.8858 1.0086 Eukaryote PL
dbCAN_3:eCAMI 0.7996 0.3214 0.6344 0.9649 Eukaryote PL
dbCAN_4 0.9750 0.0825 0.9326 1.0174 Eukaryote PL
dbCAN_4:HMMER 0.9750 0.0825 0.9326 1.0174 Eukaryote PL
dbCAN_4:DIAMOND 0.9668 0.0954 0.9178 1.0159 Eukaryote PL
dbCAN_4:dbCAN-sub 0.9750 0.0825 0.9326 1.0174 Eukaryote PL
CUPP 0.8764 0.2539 0.7459 1.0070 Eukaryote PL
dbCAN_2 0.9224 0.1305 0.8926 0.9522 All CE
dbCAN_2:HMMER 0.9248 0.1119 0.8993 0.9504 All CE
dbCAN_2:DIAMOND 0.8636 0.2234 0.8126 0.9147 All CE
dbCAN_2:Hotpep 0.8555 0.1705 0.8165 0.8944 All CE
dbCAN_3 0.9236 0.1397 0.8916 0.9555 All CE
dbCAN_3:HMMER 0.9302 0.1073 0.9057 0.9547 All CE
dbCAN_3:DIAMOND 0.9152 0.1682 0.8768 0.9537 All CE
dbCAN_3:eCAMI 0.8314 0.2020 0.7852 0.8775 All CE
dbCAN_4 0.9638 0.0854 0.9443 0.9833 All CE
dbCAN_4:HMMER 0.9487 0.0957 0.9268 0.9706 All CE
dbCAN_4:DIAMOND 0.9507 0.1192 0.9234 0.9779 All CE
dbCAN_4:dbCAN-sub 0.9617 0.0863 0.9420 0.9814 All CE
CUPP 0.9250 0.1095 0.9000 0.9500 All CE
dbCAN_2 0.9052 0.1414 0.8600 0.9505 Bacteria CE
dbCAN_2:HMMER 0.8823 0.1265 0.8419 0.9228 Bacteria CE
dbCAN_2:DIAMOND 0.8759 0.1626 0.8239 0.9279 Bacteria CE
dbCAN_2:Hotpep 0.8730 0.1582 0.8224 0.9236 Bacteria CE
dbCAN_3 0.8941 0.1625 0.8421 0.9460 Bacteria CE
dbCAN_3:HMMER 0.8875 0.1215 0.8486 0.9264 Bacteria CE
dbCAN_3:DIAMOND 0.9092 0.1444 0.8630 0.9554 Bacteria CE
dbCAN_3:eCAMI 0.8444 0.1781 0.7874 0.9013 Bacteria CE
dbCAN_4 0.9400 0.1099 0.9049 0.9751 Bacteria CE
dbCAN_4:HMMER 0.9143 0.1191 0.8762 0.9524 Bacteria CE
dbCAN_4:DIAMOND 0.9299 0.1425 0.8843 0.9754 Bacteria CE
dbCAN_4:dbCAN-sub 0.9381 0.1101 0.9029 0.9733 Bacteria CE
CUPP 0.9000 0.1196 0.8617 0.9382 Bacteria CE
dbCAN_2 0.9415 0.1162 0.9022 0.9808 Eukaryote CE
dbCAN_2:HMMER 0.9721 0.0682 0.9490 0.9952 Eukaryote CE
dbCAN_2:DIAMOND 0.8501 0.2777 0.7561 0.9440 Eukaryote CE
dbCAN_2:Hotpep 0.8360 0.1836 0.7739 0.8981 Eukaryote CE
dbCAN_3 0.9563 0.1017 0.9219 0.9908 Eukaryote CE
dbCAN_3:HMMER 0.9777 0.0616 0.9568 0.9985 Eukaryote CE
dbCAN_3:DIAMOND 0.9219 0.1931 0.8566 0.9873 Eukaryote CE
dbCAN_3:eCAMI 0.8169 0.2274 0.7400 0.8939 Eukaryote CE
dbCAN_4 0.9902 0.0287 0.9805 1.0000 Eukaryote CE
dbCAN_4:HMMER 0.9869 0.0309 0.9765 0.9974 Eukaryote CE
dbCAN_4:DIAMOND 0.9738 0.0823 0.9459 1.0016 Eukaryote CE
dbCAN_4:dbCAN-sub 0.9880 0.0335 0.9766 0.9993 Eukaryote CE
CUPP 0.9528 0.0907 0.9221 0.9835 Eukaryote CE
dbCAN_2 0.9104 0.1322 0.8720 0.9488 All AA
dbCAN_2:HMMER 0.9173 0.1137 0.8843 0.9503 All AA
dbCAN_2:DIAMOND 0.8766 0.1603 0.8301 0.9232 All AA
dbCAN_2:Hotpep 0.8733 0.1924 0.8174 0.9292 All AA
dbCAN_3 0.9436 0.1100 0.9117 0.9756 All AA
dbCAN_3:HMMER 0.9434 0.1112 0.9111 0.9757 All AA
dbCAN_3:DIAMOND 0.9413 0.1144 0.9080 0.9745 All AA
dbCAN_3:eCAMI 0.8006 0.2558 0.7263 0.8749 All AA
dbCAN_4 0.9464 0.1085 0.9149 0.9779 All AA
dbCAN_4:HMMER 0.9434 0.1112 0.9111 0.9757 All AA
dbCAN_4:DIAMOND 0.9306 0.1226 0.8950 0.9662 All AA
dbCAN_4:dbCAN-sub 0.9430 0.1090 0.9113 0.9746 All AA
CUPP 0.9011 0.1376 0.8611 0.9410 All AA
dbCAN_2 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:Hotpep 0.9091 0.3015 0.7065 1.1116 Bacteria AA
dbCAN_3 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:eCAMI 0.9091 0.3015 0.7065 1.1116 Bacteria AA
dbCAN_4 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:DIAMOND 0.9697 0.1005 0.9022 1.0372 Bacteria AA
dbCAN_4:dbCAN-sub 1.0000 0.0000 1.0000 1.0000 Bacteria AA
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2 0.8838 0.1401 0.8371 0.9305 Eukaryote AA
dbCAN_2:HMMER 0.8927 0.1190 0.8530 0.9324 Eukaryote AA
dbCAN_2:DIAMOND 0.8399 0.1659 0.7846 0.8952 Eukaryote AA
dbCAN_2:Hotpep 0.8626 0.1503 0.8125 0.9127 Eukaryote AA
dbCAN_3 0.9269 0.1205 0.8867 0.9670 Eukaryote AA
dbCAN_3:HMMER 0.9266 0.1219 0.8860 0.9673 Eukaryote AA
dbCAN_3:DIAMOND 0.9238 0.1254 0.8820 0.9656 Eukaryote AA
dbCAN_3:eCAMI 0.7683 0.2356 0.6898 0.8469 Eukaryote AA
dbCAN_4 0.9305 0.1193 0.8907 0.9703 Eukaryote AA
dbCAN_4:HMMER 0.9266 0.1219 0.8860 0.9673 Eukaryote AA
dbCAN_4:DIAMOND 0.9190 0.1274 0.8765 0.9615 Eukaryote AA
dbCAN_4:dbCAN-sub 0.9260 0.1193 0.8863 0.9658 Eukaryote AA
CUPP 0.8716 0.1443 0.8235 0.9197 Eukaryote AA
dbCAN_2 0.7970 0.1542 0.7627 0.8314 All CBM
dbCAN_2:HMMER 0.5414 0.2150 0.4935 0.5892 All CBM
dbCAN_2:DIAMOND 0.8325 0.1762 0.7933 0.8717 All CBM
dbCAN_2:Hotpep 0.5508 0.1723 0.5125 0.5892 All CBM
dbCAN_3 0.8354 0.1546 0.8010 0.8698 All CBM
dbCAN_3:HMMER 0.5425 0.2134 0.4950 0.5900 All CBM
dbCAN_3:DIAMOND 0.8986 0.1383 0.8678 0.9294 All CBM
dbCAN_3:eCAMI 0.6810 0.2079 0.6347 0.7272 All CBM
dbCAN_4 0.8547 0.1392 0.8237 0.8857 All CBM
dbCAN_4:HMMER 0.5954 0.2202 0.5464 0.6444 All CBM
dbCAN_4:DIAMOND 0.9072 0.1362 0.8768 0.9375 All CBM
dbCAN_4:dbCAN-sub 0.8576 0.1492 0.8244 0.8908 All CBM
CUPP 0.0000 0.0000 0.0000 0.0000 All CBM
dbCAN_2 0.8144 0.1600 0.7632 0.8656 Bacteria CBM
dbCAN_2:HMMER 0.6348 0.1986 0.5712 0.6983 Bacteria CBM
dbCAN_2:DIAMOND 0.8327 0.1875 0.7727 0.8926 Bacteria CBM
dbCAN_2:Hotpep 0.5583 0.2060 0.4925 0.6242 Bacteria CBM
dbCAN_3 0.8330 0.1861 0.7734 0.8925 Bacteria CBM
dbCAN_3:HMMER 0.6337 0.1981 0.5704 0.6971 Bacteria CBM
dbCAN_3:DIAMOND 0.8648 0.1691 0.8108 0.9189 Bacteria CBM
dbCAN_3:eCAMI 0.6709 0.2250 0.5989 0.7428 Bacteria CBM
dbCAN_4 0.9296 0.0604 0.9103 0.9489 Bacteria CBM
dbCAN_4:HMMER 0.7163 0.1850 0.6571 0.7755 Bacteria CBM
dbCAN_4:DIAMOND 0.8607 0.1698 0.8063 0.9150 Bacteria CBM
dbCAN_4:dbCAN-sub 0.9484 0.0628 0.9283 0.9685 Bacteria CBM
CUPP 0.0000 0.0000 0.0000 0.0000 Bacteria CBM
dbCAN_2 0.7797 0.1483 0.7322 0.8271 Eukaryote CBM
dbCAN_2:HMMER 0.4480 0.1904 0.3871 0.5089 Eukaryote CBM
dbCAN_2:DIAMOND 0.8324 0.1666 0.7791 0.8857 Eukaryote CBM
dbCAN_2:Hotpep 0.5433 0.1325 0.5009 0.5856 Eukaryote CBM
dbCAN_3 0.8378 0.1173 0.8003 0.8754 Eukaryote CBM
dbCAN_3:HMMER 0.4512 0.1895 0.3906 0.5118 Eukaryote CBM
dbCAN_3:DIAMOND 0.9323 0.0884 0.9040 0.9606 Eukaryote CBM
dbCAN_3:eCAMI 0.6911 0.1917 0.6297 0.7524 Eukaryote CBM
dbCAN_4 0.7798 0.1552 0.7302 0.8295 Eukaryote CBM
dbCAN_4:HMMER 0.4745 0.1843 0.4155 0.5334 Eukaryote CBM
dbCAN_4:DIAMOND 0.9537 0.0657 0.9327 0.9747 Eukaryote CBM
dbCAN_4:dbCAN-sub 0.7669 0.1557 0.7171 0.8166 Eukaryote CBM
CUPP 0.0000 0.0000 0.0000 0.0000 Eukaryote CBM

8.2.5 Accuracy

Table 8.10: Overall performance (represented by the Accuracy) of CAZy class classification by CAZy classifiers per taxonomy group
Prediction_tool Mean Standard Deviation LowerCI UpperCI Tax_group CAZy_class
dbCAN_2 0.9658 0.0361 0.9578 0.9738 All GH
dbCAN_2:HMMER 0.9536 0.0330 0.9463 0.9609 All GH
dbCAN_2:DIAMOND 0.9644 0.0415 0.9551 0.9736 All GH
dbCAN_2:Hotpep 0.9297 0.0495 0.9187 0.9407 All GH
dbCAN_3 0.9768 0.0304 0.9701 0.9836 All GH
dbCAN_3:HMMER 0.9599 0.0331 0.9525 0.9672 All GH
dbCAN_3:DIAMOND 0.9812 0.0295 0.9747 0.9878 All GH
dbCAN_3:eCAMI 0.9364 0.0485 0.9256 0.9472 All GH
dbCAN_4 0.9743 0.0307 0.9675 0.9812 All GH
dbCAN_4:HMMER 0.9601 0.0328 0.9528 0.9674 All GH
dbCAN_4:DIAMOND 0.9812 0.0304 0.9745 0.9880 All GH
dbCAN_4:dbCAN-sub 0.9731 0.0297 0.9665 0.9797 All GH
CUPP 0.9540 0.0293 0.9475 0.9605 All GH
dbCAN_2 0.9660 0.0387 0.9536 0.9784 Bacteria GH
dbCAN_2:HMMER 0.9544 0.0399 0.9416 0.9672 Bacteria GH
dbCAN_2:DIAMOND 0.9611 0.0445 0.9469 0.9753 Bacteria GH
dbCAN_2:Hotpep 0.9375 0.0449 0.9231 0.9518 Bacteria GH
dbCAN_3 0.9788 0.0346 0.9678 0.9899 Bacteria GH
dbCAN_3:HMMER 0.9579 0.0409 0.9448 0.9710 Bacteria GH
dbCAN_3:DIAMOND 0.9773 0.0324 0.9669 0.9877 Bacteria GH
dbCAN_3:eCAMI 0.9472 0.0502 0.9312 0.9633 Bacteria GH
dbCAN_4 0.9762 0.0378 0.9641 0.9883 Bacteria GH
dbCAN_4:HMMER 0.9584 0.0405 0.9454 0.9713 Bacteria GH
dbCAN_4:DIAMOND 0.9766 0.0361 0.9650 0.9881 Bacteria GH
dbCAN_4:dbCAN-sub 0.9765 0.0361 0.9649 0.9880 Bacteria GH
CUPP 0.9555 0.0313 0.9455 0.9655 Bacteria GH
dbCAN_2 0.9656 0.0337 0.9548 0.9763 Eukaryote GH
dbCAN_2:HMMER 0.9528 0.0247 0.9449 0.9607 Eukaryote GH
dbCAN_2:DIAMOND 0.9676 0.0386 0.9553 0.9800 Eukaryote GH
dbCAN_2:Hotpep 0.9219 0.0531 0.9050 0.9389 Eukaryote GH
dbCAN_3 0.9748 0.0259 0.9666 0.9831 Eukaryote GH
dbCAN_3:HMMER 0.9619 0.0231 0.9545 0.9692 Eukaryote GH
dbCAN_3:DIAMOND 0.9852 0.0261 0.9768 0.9935 Eukaryote GH
dbCAN_3:eCAMI 0.9255 0.0446 0.9113 0.9398 Eukaryote GH
dbCAN_4 0.9725 0.0218 0.9655 0.9795 Eukaryote GH
dbCAN_4:HMMER 0.9619 0.0231 0.9545 0.9692 Eukaryote GH
dbCAN_4:DIAMOND 0.9859 0.0230 0.9785 0.9933 Eukaryote GH
dbCAN_4:dbCAN-sub 0.9698 0.0215 0.9630 0.9767 Eukaryote GH
CUPP 0.9525 0.0275 0.9437 0.9613 Eukaryote GH
dbCAN_2 0.9550 0.0770 0.9379 0.9721 All GT
dbCAN_2:HMMER 0.9498 0.0598 0.9365 0.9631 All GT
dbCAN_2:DIAMOND 0.9672 0.0821 0.9489 0.9854 All GT
dbCAN_2:Hotpep 0.9031 0.0908 0.8829 0.9233 All GT
dbCAN_3 0.9751 0.0585 0.9621 0.9881 All GT
dbCAN_3:HMMER 0.9503 0.0597 0.9370 0.9635 All GT
dbCAN_3:DIAMOND 0.9848 0.0567 0.9722 0.9974 All GT
dbCAN_3:eCAMI 0.9413 0.0822 0.9230 0.9596 All GT
dbCAN_4 0.9781 0.0572 0.9654 0.9909 All GT
dbCAN_4:HMMER 0.9503 0.0597 0.9370 0.9635 All GT
dbCAN_4:DIAMOND 0.9838 0.0577 0.9710 0.9966 All GT
dbCAN_4:dbCAN-sub 0.9773 0.0519 0.9657 0.9889 All GT
CUPP 0.9462 0.0579 0.9333 0.9591 All GT
dbCAN_2 0.9581 0.0752 0.9340 0.9821 Bacteria GT
dbCAN_2:HMMER 0.9469 0.0757 0.9227 0.9711 Bacteria GT
dbCAN_2:DIAMOND 0.9704 0.0804 0.9447 0.9961 Bacteria GT
dbCAN_2:Hotpep 0.8982 0.0676 0.8766 0.9199 Bacteria GT
dbCAN_3 0.9734 0.0739 0.9497 0.9970 Bacteria GT
dbCAN_3:HMMER 0.9463 0.0752 0.9223 0.9704 Bacteria GT
dbCAN_3:DIAMOND 0.9827 0.0721 0.9596 1.0057 Bacteria GT
dbCAN_3:eCAMI 0.9521 0.0640 0.9317 0.9726 Bacteria GT
dbCAN_4 0.9764 0.0732 0.9530 0.9998 Bacteria GT
dbCAN_4:HMMER 0.9466 0.0753 0.9225 0.9706 Bacteria GT
dbCAN_4:DIAMOND 0.9797 0.0738 0.9561 1.0033 Bacteria GT
dbCAN_4:dbCAN-sub 0.9770 0.0650 0.9562 0.9978 Bacteria GT
CUPP 0.9602 0.0488 0.9445 0.9758 Bacteria GT
dbCAN_2 0.9520 0.0795 0.9265 0.9774 Eukaryote GT
dbCAN_2:HMMER 0.9527 0.0386 0.9403 0.9650 Eukaryote GT
dbCAN_2:DIAMOND 0.9639 0.0847 0.9368 0.9910 Eukaryote GT
dbCAN_2:Hotpep 0.9079 0.1099 0.8728 0.9430 Eukaryote GT
dbCAN_3 0.9768 0.0384 0.9646 0.9891 Eukaryote GT
dbCAN_3:HMMER 0.9542 0.0390 0.9417 0.9667 Eukaryote GT
dbCAN_3:DIAMOND 0.9869 0.0362 0.9754 0.9985 Eukaryote GT
dbCAN_3:eCAMI 0.9304 0.0967 0.8995 0.9614 Eukaryote GT
dbCAN_4 0.9798 0.0354 0.9685 0.9911 Eukaryote GT
dbCAN_4:HMMER 0.9540 0.0389 0.9415 0.9664 Eukaryote GT
dbCAN_4:DIAMOND 0.9879 0.0354 0.9766 0.9992 Eukaryote GT
dbCAN_4:dbCAN-sub 0.9776 0.0352 0.9664 0.9889 Eukaryote GT
CUPP 0.9322 0.0633 0.9120 0.9525 Eukaryote GT
dbCAN_2 0.9959 0.0070 0.9938 0.9979 All PL
dbCAN_2:HMMER 0.9963 0.0062 0.9944 0.9981 All PL
dbCAN_2:DIAMOND 0.9954 0.0073 0.9933 0.9976 All PL
dbCAN_2:Hotpep 0.9929 0.0131 0.9891 0.9968 All PL
dbCAN_3 0.9990 0.0030 0.9981 0.9999 All PL
dbCAN_3:HMMER 0.9986 0.0035 0.9975 0.9996 All PL
dbCAN_3:DIAMOND 0.9985 0.0041 0.9973 0.9997 All PL
dbCAN_3:eCAMI 0.9920 0.0129 0.9883 0.9958 All PL
dbCAN_4 0.9986 0.0035 0.9975 0.9996 All PL
dbCAN_4:HMMER 0.9986 0.0035 0.9975 0.9996 All PL
dbCAN_4:DIAMOND 0.9988 0.0039 0.9976 0.9999 All PL
dbCAN_4:dbCAN-sub 0.9988 0.0032 0.9978 0.9997 All PL
CUPP 0.9941 0.0096 0.9913 0.9970 All PL
dbCAN_2 0.9957 0.0077 0.9928 0.9986 Bacteria PL
dbCAN_2:HMMER 0.9960 0.0067 0.9935 0.9985 Bacteria PL
dbCAN_2:DIAMOND 0.9957 0.0077 0.9928 0.9986 Bacteria PL
dbCAN_2:Hotpep 0.9917 0.0157 0.9859 0.9976 Bacteria PL
dbCAN_3 0.9990 0.0030 0.9979 1.0001 Bacteria PL
dbCAN_3:HMMER 0.9983 0.0038 0.9969 0.9997 Bacteria PL
dbCAN_3:DIAMOND 0.9990 0.0030 0.9979 1.0001 Bacteria PL
dbCAN_3:eCAMI 0.9911 0.0153 0.9854 0.9967 Bacteria PL
dbCAN_4 0.9983 0.0038 0.9969 0.9997 Bacteria PL
dbCAN_4:HMMER 0.9983 0.0038 0.9969 0.9997 Bacteria PL
dbCAN_4:DIAMOND 0.9990 0.0030 0.9979 1.0001 Bacteria PL
dbCAN_4:dbCAN-sub 0.9987 0.0034 0.9974 1.0000 Bacteria PL
CUPP 0.9930 0.0112 0.9888 0.9972 Bacteria PL
dbCAN_2 0.9962 0.0056 0.9933 0.9990 Eukaryote PL
dbCAN_2:HMMER 0.9967 0.0055 0.9939 0.9995 Eukaryote PL
dbCAN_2:DIAMOND 0.9950 0.0067 0.9915 0.9984 Eukaryote PL
dbCAN_2:Hotpep 0.9950 0.0066 0.9916 0.9984 Eukaryote PL
dbCAN_3 0.9989 0.0030 0.9974 1.0005 Eukaryote PL
dbCAN_3:HMMER 0.9989 0.0030 0.9974 1.0005 Eukaryote PL
dbCAN_3:DIAMOND 0.9977 0.0055 0.9948 1.0006 Eukaryote PL
dbCAN_3:eCAMI 0.9938 0.0065 0.9905 0.9972 Eukaryote PL
dbCAN_4 0.9990 0.0029 0.9974 1.0005 Eukaryote PL
dbCAN_4:HMMER 0.9989 0.0030 0.9974 1.0005 Eukaryote PL
dbCAN_4:DIAMOND 0.9983 0.0052 0.9956 1.0010 Eukaryote PL
dbCAN_4:dbCAN-sub 0.9990 0.0029 0.9975 1.0005 Eukaryote PL
CUPP 0.9961 0.0056 0.9933 0.9990 Eukaryote PL
dbCAN_2 0.9893 0.0225 0.9842 0.9945 All CE
dbCAN_2:HMMER 0.9901 0.0189 0.9858 0.9945 All CE
dbCAN_2:DIAMOND 0.9862 0.0242 0.9807 0.9918 All CE
dbCAN_2:Hotpep 0.9821 0.0251 0.9764 0.9878 All CE
dbCAN_3 0.9896 0.0220 0.9846 0.9946 All CE
dbCAN_3:HMMER 0.9910 0.0177 0.9870 0.9951 All CE
dbCAN_3:DIAMOND 0.9890 0.0231 0.9837 0.9943 All CE
dbCAN_3:eCAMI 0.9818 0.0249 0.9761 0.9875 All CE
dbCAN_4 0.9943 0.0172 0.9904 0.9982 All CE
dbCAN_4:HMMER 0.9929 0.0173 0.9889 0.9968 All CE
dbCAN_4:DIAMOND 0.9919 0.0218 0.9870 0.9969 All CE
dbCAN_4:dbCAN-sub 0.9938 0.0175 0.9898 0.9978 All CE
CUPP 0.9900 0.0187 0.9857 0.9943 All CE
dbCAN_2 0.9849 0.0283 0.9758 0.9939 Bacteria CE
dbCAN_2:HMMER 0.9840 0.0239 0.9764 0.9916 Bacteria CE
dbCAN_2:DIAMOND 0.9824 0.0282 0.9734 0.9915 Bacteria CE
dbCAN_2:Hotpep 0.9787 0.0309 0.9688 0.9886 Bacteria CE
dbCAN_3 0.9841 0.0280 0.9752 0.9931 Bacteria CE
dbCAN_3:HMMER 0.9852 0.0224 0.9780 0.9923 Bacteria CE
dbCAN_3:DIAMOND 0.9864 0.0268 0.9778 0.9950 Bacteria CE
dbCAN_3:eCAMI 0.9777 0.0303 0.9680 0.9874 Bacteria CE
dbCAN_4 0.9904 0.0228 0.9831 0.9977 Bacteria CE
dbCAN_4:HMMER 0.9882 0.0225 0.9810 0.9954 Bacteria CE
dbCAN_4:DIAMOND 0.9886 0.0263 0.9801 0.9970 Bacteria CE
dbCAN_4:dbCAN-sub 0.9899 0.0230 0.9826 0.9973 Bacteria CE
CUPP 0.9852 0.0236 0.9777 0.9927 Bacteria CE
dbCAN_2 0.9943 0.0119 0.9902 0.9983 Eukaryote CE
dbCAN_2:HMMER 0.9970 0.0062 0.9949 0.9991 Eukaryote CE
dbCAN_2:DIAMOND 0.9905 0.0182 0.9843 0.9966 Eukaryote CE
dbCAN_2:Hotpep 0.9859 0.0159 0.9805 0.9913 Eukaryote CE
dbCAN_3 0.9957 0.0095 0.9924 0.9989 Eukaryote CE
dbCAN_3:HMMER 0.9975 0.0054 0.9957 0.9994 Eukaryote CE
dbCAN_3:DIAMOND 0.9919 0.0181 0.9858 0.9980 Eukaryote CE
dbCAN_3:eCAMI 0.9863 0.0162 0.9808 0.9918 Eukaryote CE
dbCAN_4 0.9986 0.0041 0.9972 1.0000 Eukaryote CE
dbCAN_4:HMMER 0.9981 0.0046 0.9965 0.9996 Eukaryote CE
dbCAN_4:DIAMOND 0.9957 0.0147 0.9907 1.0007 Eukaryote CE
dbCAN_4:dbCAN-sub 0.9981 0.0055 0.9963 1.0000 Eukaryote CE
CUPP 0.9953 0.0086 0.9924 0.9983 Eukaryote CE
dbCAN_2 0.9856 0.0204 0.9796 0.9915 All AA
dbCAN_2:HMMER 0.9856 0.0202 0.9797 0.9914 All AA
dbCAN_2:DIAMOND 0.9837 0.0212 0.9775 0.9899 All AA
dbCAN_2:Hotpep 0.9827 0.0227 0.9761 0.9893 All AA
dbCAN_3 0.9901 0.0188 0.9846 0.9955 All AA
dbCAN_3:HMMER 0.9901 0.0196 0.9844 0.9958 All AA
dbCAN_3:DIAMOND 0.9897 0.0200 0.9839 0.9955 All AA
dbCAN_3:eCAMI 0.9791 0.0233 0.9724 0.9859 All AA
dbCAN_4 0.9907 0.0188 0.9853 0.9962 All AA
dbCAN_4:HMMER 0.9901 0.0196 0.9844 0.9958 All AA
dbCAN_4:DIAMOND 0.9897 0.0198 0.9840 0.9954 All AA
dbCAN_4:dbCAN-sub 0.9902 0.0189 0.9847 0.9956 All AA
CUPP 0.9841 0.0215 0.9778 0.9903 All AA
dbCAN_2 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2:Hotpep 0.9991 0.0030 0.9971 1.0011 Bacteria AA
dbCAN_3 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:DIAMOND 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_3:eCAMI 0.9991 0.0030 0.9971 1.0011 Bacteria AA
dbCAN_4 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:HMMER 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_4:DIAMOND 0.9991 0.0029 0.9972 1.0011 Bacteria AA
dbCAN_4:dbCAN-sub 1.0000 0.0000 1.0000 1.0000 Bacteria AA
CUPP 1.0000 0.0000 1.0000 1.0000 Bacteria AA
dbCAN_2 0.9813 0.0215 0.9741 0.9884 Eukaryote AA
dbCAN_2:HMMER 0.9813 0.0212 0.9742 0.9884 Eukaryote AA
dbCAN_2:DIAMOND 0.9789 0.0220 0.9715 0.9862 Eukaryote AA
dbCAN_2:Hotpep 0.9778 0.0237 0.9699 0.9857 Eukaryote AA
dbCAN_3 0.9871 0.0206 0.9803 0.9940 Eukaryote AA
dbCAN_3:HMMER 0.9872 0.0215 0.9800 0.9943 Eukaryote AA
dbCAN_3:DIAMOND 0.9866 0.0219 0.9793 0.9939 Eukaryote AA
dbCAN_3:eCAMI 0.9732 0.0234 0.9654 0.9810 Eukaryote AA
dbCAN_4 0.9880 0.0207 0.9811 0.9949 Eukaryote AA
dbCAN_4:HMMER 0.9872 0.0215 0.9800 0.9943 Eukaryote AA
dbCAN_4:DIAMOND 0.9869 0.0217 0.9797 0.9942 Eukaryote AA
dbCAN_4:dbCAN-sub 0.9872 0.0207 0.9803 0.9941 Eukaryote AA
CUPP 0.9793 0.0225 0.9719 0.9868 Eukaryote AA
dbCAN_2 0.9656 0.0257 0.9599 0.9713 All CBM
dbCAN_2:HMMER 0.9378 0.0296 0.9312 0.9444 All CBM
dbCAN_2:DIAMOND 0.9719 0.0252 0.9663 0.9775 All CBM
dbCAN_2:Hotpep 0.8823 0.0551 0.8701 0.8946 All CBM
dbCAN_3 0.9718 0.0228 0.9667 0.9768 All CBM
dbCAN_3:HMMER 0.9379 0.0300 0.9313 0.9446 All CBM
dbCAN_3:DIAMOND 0.9814 0.0185 0.9773 0.9855 All CBM
dbCAN_3:eCAMI 0.9267 0.0565 0.9141 0.9392 All CBM
dbCAN_4 0.9763 0.0207 0.9717 0.9809 All CBM
dbCAN_4:HMMER 0.9436 0.0300 0.9370 0.9503 All CBM
dbCAN_4:DIAMOND 0.9830 0.0182 0.9790 0.9870 All CBM
dbCAN_4:dbCAN-sub 0.9761 0.0244 0.9707 0.9816 All CBM
CUPP 0.8851 0.0851 0.8662 0.9040 All CBM
dbCAN_2 0.9672 0.0266 0.9587 0.9758 Bacteria CBM
dbCAN_2:HMMER 0.9400 0.0356 0.9286 0.9514 Bacteria CBM
dbCAN_2:DIAMOND 0.9718 0.0240 0.9641 0.9795 Bacteria CBM
dbCAN_2:Hotpep 0.8735 0.0634 0.8532 0.8937 Bacteria CBM
dbCAN_3 0.9707 0.0237 0.9631 0.9783 Bacteria CBM
dbCAN_3:HMMER 0.9400 0.0364 0.9283 0.9516 Bacteria CBM
dbCAN_3:DIAMOND 0.9760 0.0178 0.9703 0.9817 Bacteria CBM
dbCAN_3:eCAMI 0.9171 0.0605 0.8978 0.9365 Bacteria CBM
dbCAN_4 0.9855 0.0128 0.9814 0.9896 Bacteria CBM
dbCAN_4:HMMER 0.9494 0.0365 0.9377 0.9611 Bacteria CBM
dbCAN_4:DIAMOND 0.9755 0.0181 0.9697 0.9812 Bacteria CBM
dbCAN_4:dbCAN-sub 0.9892 0.0129 0.9851 0.9933 Bacteria CBM
CUPP 0.8614 0.1127 0.8253 0.8974 Bacteria CBM
dbCAN_2 0.9640 0.0250 0.9560 0.9720 Eukaryote CBM
dbCAN_2:HMMER 0.9356 0.0223 0.9284 0.9427 Eukaryote CBM
dbCAN_2:DIAMOND 0.9720 0.0267 0.9635 0.9806 Eukaryote CBM
dbCAN_2:Hotpep 0.8912 0.0445 0.8770 0.9054 Eukaryote CBM
dbCAN_3 0.9728 0.0221 0.9658 0.9799 Eukaryote CBM
dbCAN_3:HMMER 0.9359 0.0221 0.9288 0.9430 Eukaryote CBM
dbCAN_3:DIAMOND 0.9869 0.0178 0.9812 0.9926 Eukaryote CBM
dbCAN_3:eCAMI 0.9362 0.0512 0.9198 0.9526 Eukaryote CBM
dbCAN_4 0.9672 0.0230 0.9598 0.9745 Eukaryote CBM
dbCAN_4:HMMER 0.9379 0.0205 0.9313 0.9444 Eukaryote CBM
dbCAN_4:DIAMOND 0.9905 0.0150 0.9857 0.9953 Eukaryote CBM
dbCAN_4:dbCAN-sub 0.9631 0.0262 0.9547 0.9715 Eukaryote CBM
CUPP 0.9088 0.0286 0.8997 0.9180 Eukaryote CBM

9 CAZy class multilabel classification tax performance

Table 9.1: Rand Index of CAZyme classifier classification of CAZy class annotations
Prediction_tool Lower CI Mean Upper CI Standard Deviation Tax_group
dbCAN_2 0.9755 0.9769 0.9782 0.0855 All
dbCAN_2:HMMER 0.9679 0.9694 0.9709 0.0969 All
dbCAN_2:DIAMOND 0.9778 0.9791 0.9804 0.0821 All
dbCAN_2:Hotpep 0.9445 0.9464 0.9483 0.1248 All
dbCAN_3 0.9828 0.9839 0.9850 0.0721 All
dbCAN_3:HMMER 0.9699 0.9714 0.9728 0.0939 All
dbCAN_3:DIAMOND 0.9865 0.9875 0.9885 0.0643 All
dbCAN_3:eCAMI 0.9593 0.9610 0.9627 0.1092 All
dbCAN_4 0.9845 0.9856 0.9866 0.0683 All
dbCAN_4:HMMER 0.9712 0.9727 0.9741 0.0920 All
dbCAN_4:DIAMOND 0.9871 0.9880 0.9890 0.0627 All
dbCAN_4:dbCAN-sub 0.9840 0.9851 0.9861 0.0694 All
CUPP 0.9580 0.9597 0.9614 0.1093 All
dbCAN_2 0.9766 0.9784 0.9802 0.0825 Bacteria
dbCAN_2:HMMER 0.9679 0.9700 0.9721 0.0958 Bacteria
dbCAN_2:DIAMOND 0.9784 0.9802 0.9819 0.0801 Bacteria
dbCAN_2:Hotpep 0.9438 0.9466 0.9493 0.1249 Bacteria
dbCAN_3 0.9825 0.9840 0.9856 0.0716 Bacteria
dbCAN_3:HMMER 0.9690 0.9710 0.9731 0.0944 Bacteria
dbCAN_3:DIAMOND 0.9853 0.9867 0.9882 0.0662 Bacteria
dbCAN_3:eCAMI 0.9619 0.9642 0.9666 0.1054 Bacteria
dbCAN_4 0.9863 0.9877 0.9891 0.0632 Bacteria
dbCAN_4:HMMER 0.9712 0.9732 0.9752 0.0911 Bacteria
dbCAN_4:DIAMOND 0.9848 0.9863 0.9877 0.0669 Bacteria
dbCAN_4:dbCAN-sub 0.9870 0.9884 0.9897 0.0614 Bacteria
CUPP 0.9568 0.9592 0.9616 0.1101 Bacteria
dbCAN_2 0.9734 0.9753 0.9772 0.0885 Eukaryote
dbCAN_2:HMMER 0.9666 0.9688 0.9709 0.0979 Eukaryote
dbCAN_2:DIAMOND 0.9762 0.9781 0.9799 0.0840 Eukaryote
dbCAN_2:Hotpep 0.9435 0.9462 0.9490 0.1248 Eukaryote
dbCAN_3 0.9822 0.9838 0.9854 0.0725 Eukaryote
dbCAN_3:HMMER 0.9697 0.9717 0.9737 0.0934 Eukaryote
dbCAN_3:DIAMOND 0.9869 0.9883 0.9896 0.0624 Eukaryote
dbCAN_3:eCAMI 0.9553 0.9577 0.9602 0.1128 Eukaryote
dbCAN_4 0.9819 0.9835 0.9851 0.0730 Eukaryote
dbCAN_4:HMMER 0.9701 0.9721 0.9741 0.0930 Eukaryote
dbCAN_4:DIAMOND 0.9886 0.9898 0.9911 0.0582 Eukaryote
dbCAN_4:dbCAN-sub 0.9801 0.9817 0.9834 0.0765 Eukaryote
CUPP 0.9578 0.9602 0.9625 0.1086 Eukaryote

10 CAZy family classification

The following section evaluates the performance of the CAZyme classifiers to predict CAZy family classifications.

10.2 CAZy family sensitivity against specificity

For better resolution we can group the CAZy families by their parent CAzy classes, and compare the performances of the tools CAZy class, by CAZy class. Owing to the minimal variation in specificity scores, specificity was plotted as the percentage specificity log10.

10.2.1 Glycoside Hydrolases

Figure 10.13 shows the plotting of sensitivity against specificity for each Glycoside Hydrolase CAZy family.

Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Glycoside Hydrolases. Each GH CAZy family is represented as a single point on the plot.

Figure 10.13: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Glycoside Hydrolases. Each GH CAZy family is represented as a single point on the plot.

10.2.2 Glycosyltransferases

Figure 10.14 shows the plotting of sensitivity against specificity for each Glycosyltransferases CAZy family.

Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Glycosyltransferases. Each GT CAZy family is represented as a single point on the plot.

Figure 10.14: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Glycosyltransferases. Each GT CAZy family is represented as a single point on the plot.

10.2.3 Polysaccharide Lyases

Figure 10.13 shows the plotting of sensitivity against specificity for each Polysaccharide Lyases CAZy family.

Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Polysaccharide Lyases. Each PL CAZy family is represented as a single point on the plot.

Figure 10.15: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Polysaccharide Lyases. Each PL CAZy family is represented as a single point on the plot.

10.2.4 Carbohydrate Esterases

Figure 10.16 shows the plotting of sensitivity against specificity for each Carbohydrate Esterases CAZy family.

Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Carbohydrate Esterases. Each CE CAZy family is represented as a single point on the plot.

Figure 10.16: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Carbohydrate Esterases. Each CE CAZy family is represented as a single point on the plot.

10.2.5 Auxillary Activities

Figure ?? shows the plotting of sensitivity against specificity for each Auxillary Activities CAZy family.

Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Auxillary Activities. Each AA CAZy family is represented as a single point on the plot.

Figure 10.17: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Auxillary Activities. Each AA CAZy family is represented as a single point on the plot.

10.2.6 Carbohydate Binding Modules

Figure 10.18 shows the plotting of sensitivity against specificity for each Carbohydrate Binding Module CAZy family.

Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Carbohydrate Binding Modules. Each CBM CAZy family is represented as a single point on the plot.

Figure 10.18: Scatter plot of recall (sensitivity) against specificity for predicting each CAZy family for each CAZyme classifier in the CAZy class Carbohydrate Binding Modules. Each CBM CAZy family is represented as a single point on the plot.

10.3 Consistently poor performing CAZy families

We can pull out CAZy families for which at least three of the evaluated classifiers (when including the individual tools incoporated into dbCAN) produce a sensitivity score of less than 0.75 for said CAZy family.

10.4 GH difficult families

10.5 GT difficult families

10.6 PL difficult families

10.7 CE difficult families

10.8 AA difficult families

10.9 CBM difficult families

10.10 Evaluation of multi-label CAZy family classification performance

CAZy annotates proteins in a domain-wise manner. Consequently, a single protein may be assigned to multiple CAZy families. The ability of a classifier to assign all the correct CAZy family annotations for a given protein when only evaluating the CAZy family classification performance per CAZy family, independently of all other CAZy classes.

Multilabel classification raises when a single instance can be assinged to multiple classes. In this evaluation a single instance is a protein and the classes are CAZy families, a single CAZyme can be assigned to multiple CAZy families. This is important to take into consideration because the same approaches for statistical evaluation of binary classification provided a limited view of the performance of the classifiers when applied to multilabel classification.

The CAZy family multi-label classification performance is represented by the Rand Index (RI) and Adjusted Rand Index (ARI). The RI is a quantitive measure of similarity between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings. In this case the two clusters are the predicted and groud truth CAZy family annotations. The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:
ARI = (RI - Expected_RI) / (max(RI) - Expected_RI) This produces a score between 1 and -1. A score of 1 is produced if all predicted and known CAZy family annotations are identical, 0 if completely random clustering of -1 if systematically incorrect clustering and the number of incorrect classifications of proteins is greater than would be expected from randomly annotating proteins with CAZy families.

Table 10.8: Rand Index of CAZyme classifier classification of CAZy family annotations
Prediction_tool Mean Standard Deviation Lower CI Upper CI
dbCAN_2 0.9996 0.0013 0.9996 0.9997
dbCAN_2:HMMER 0.9995 0.0014 0.9995 0.9996
dbCAN_2:DIAMOND 0.9997 0.0012 0.9997 0.9997
dbCAN_2:Hotpep 0.9991 0.0023 0.9990 0.9991
dbCAN_3 0.9998 0.0011 0.9997 0.9998
dbCAN_3:HMMER 0.9996 0.0014 0.9995 0.9996
dbCAN_3:DIAMOND 0.9998 0.0010 0.9998 0.9998
dbCAN_3:eCAMI 0.9994 0.0017 0.9994 0.9994
dbCAN_4 0.9998 0.0011 0.9997 0.9998
dbCAN_4:HMMER 0.9996 0.0014 0.9996 0.9996
dbCAN_4:DIAMOND 0.9998 0.0009 0.9998 0.9998
dbCAN_4:dbCAN-sub 0.9998 0.0011 0.9997 0.9998
CUPP 0.9994 0.0015 0.9994 0.9995
Table 10.9: Adjusted Rand Index of CAZyme classifier classification of CAZy family annotations
Prediction_tool Mean Standard Deviation Lower CI Upper CI
dbCAN_2 0.9259 0.2568 0.9219 0.9299
dbCAN_2:HMMER 0.9145 0.2702 0.9103 0.9187
dbCAN_2:DIAMOND 0.9386 0.2367 0.9349 0.9422
dbCAN_2:Hotpep 0.8635 0.3219 0.8585 0.8684
dbCAN_3 0.9507 0.2127 0.9474 0.9540
dbCAN_3:HMMER 0.9201 0.2617 0.9161 0.9242
dbCAN_3:DIAMOND 0.9622 0.1878 0.9593 0.9651
dbCAN_3:eCAMI 0.8960 0.2928 0.8914 0.9005
dbCAN_4 0.9538 0.2050 0.9507 0.9570
dbCAN_4:HMMER 0.9221 0.2598 0.9180 0.9261
dbCAN_4:DIAMOND 0.9636 0.1847 0.9608 0.9665
dbCAN_4:dbCAN-sub 0.9544 0.2032 0.9512 0.9575
CUPP 0.9012 0.2832 0.8968 0.9055

11 CAZy family multilabel classification tax performance

Table 11.1: Rand Index of CAZyme famifier classificiation of CAZy fam annotations
Prediction_tool Lower CI Mean Upper CI Standard Deviation Tax_group
dbCAN_2 0.9996 0.9996 0.9997 0.0013 All
dbCAN_2:HMMER 0.9995 0.9995 0.9996 0.0014 All
dbCAN_2:DIAMOND 0.9997 0.9997 0.9997 0.0012 All
dbCAN_2:Hotpep 0.9990 0.9991 0.9991 0.0023 All
dbCAN_3 0.9997 0.9998 0.9998 0.0011 All
dbCAN_3:HMMER 0.9995 0.9996 0.9996 0.0014 All
dbCAN_3:DIAMOND 0.9998 0.9998 0.9998 0.0010 All
dbCAN_3:eCAMI 0.9994 0.9994 0.9994 0.0017 All
dbCAN_4 0.9997 0.9998 0.9998 0.0011 All
dbCAN_4:HMMER 0.9996 0.9996 0.9996 0.0014 All
dbCAN_4:DIAMOND 0.9998 0.9998 0.9998 0.0009 All
dbCAN_4:dbCAN-sub 0.9997 0.9998 0.9998 0.0011 All
CUPP 0.9994 0.9994 0.9995 0.0015 All
dbCAN_2 0.9996 0.9996 0.9997 0.0013 Bacteria
dbCAN_2:HMMER 0.9995 0.9996 0.9996 0.0014 Bacteria
dbCAN_2:DIAMOND 0.9997 0.9997 0.9997 0.0012 Bacteria
dbCAN_2:Hotpep 0.9989 0.9990 0.9990 0.0026 Bacteria
dbCAN_3 0.9997 0.9997 0.9998 0.0011 Bacteria
dbCAN_3:HMMER 0.9995 0.9996 0.9996 0.0014 Bacteria
dbCAN_3:DIAMOND 0.9998 0.9998 0.9998 0.0010 Bacteria
dbCAN_3:eCAMI 0.9993 0.9994 0.9994 0.0019 Bacteria
dbCAN_4 0.9998 0.9998 0.9998 0.0011 Bacteria
dbCAN_4:HMMER 0.9996 0.9996 0.9996 0.0013 Bacteria
dbCAN_4:DIAMOND 0.9998 0.9998 0.9998 0.0010 Bacteria
dbCAN_4:dbCAN-sub 0.9998 0.9998 0.9998 0.0011 Bacteria
CUPP 0.9994 0.9994 0.9995 0.0016 Bacteria
dbCAN_2 0.9996 0.9996 0.9997 0.0013 Eukaryote
dbCAN_2:HMMER 0.9995 0.9995 0.9996 0.0014 Eukaryote
dbCAN_2:DIAMOND 0.9997 0.9997 0.9997 0.0011 Eukaryote
dbCAN_2:Hotpep 0.9992 0.9992 0.9993 0.0019 Eukaryote
dbCAN_3 0.9997 0.9998 0.9998 0.0010 Eukaryote
dbCAN_3:HMMER 0.9995 0.9996 0.9996 0.0014 Eukaryote
dbCAN_3:DIAMOND 0.9998 0.9998 0.9999 0.0009 Eukaryote
dbCAN_3:eCAMI 0.9994 0.9994 0.9995 0.0015 Eukaryote
dbCAN_4 0.9997 0.9997 0.9998 0.0011 Eukaryote
dbCAN_4:HMMER 0.9995 0.9996 0.9996 0.0014 Eukaryote
dbCAN_4:DIAMOND 0.9998 0.9999 0.9999 0.0008 Eukaryote
dbCAN_4:dbCAN-sub 0.9997 0.9997 0.9997 0.0011 Eukaryote
CUPP 0.9994 0.9995 0.9995 0.0014 Eukaryote